Workflow
Overview and terms¶
QualityRules is a matching engine which applies match/trigger rules to a Dataframe and, when these rules evaluate to passed (i.e. they match or trigger) output sql is run.
Only one trigger rule may produce output, so salience is used as a tie-breaker, the lowest salience wins.
Aim to have unique salience for tie-breaking
If you have multiple trigger rules with the same salience that both trigger the "winning" output chosen is non-deterministic, chose your salience wisely.
An alternative way to think of this is the trigger rules are your if and the output expressions are the when, from a logic perspective it may be helpful to think of them as output verbs - when this is true do that.
Suggested approach to QualityRules management¶
- Keep unrelated rules in their own RuleSuites, making things easier to reason about
- Make commonly used lambdas or output expressions global
- Use descriptive verbs for your output expressions
- Keep duplication or complexity in lambdas
- Only use fields that change as parameters to those lambdas
- Always start with test data you want to match against and your expected output
- Run all test cases for your RuleSuite for any change, don't assume because your rule worked that others won't stop working
- Use the validation and documentation functionality to document your lambdas and verify you've not made simple mistakes - Spark errors aren't always easy to understand
This could be visualised as such:
Don't repeat yourself
If you are typing the same trigger rule, output expression or even lambda text repeatedly - make another lambda and consider making it global
Created: December 3, 2024 17:06:32