Those are some Quality flavours
Quality has four main flavours with sprinklings of other Quality ingredients like the sql function suite.
These flavours are provided by four "runners" which add a Column to a Spark Dataset/Dataframe.
Quality / QualityData - ruleRunner¶
Execute SQL based data validation rules, capture all the results and store them with your data for easy and fast access.
Example Usage: Validating in-bound data or the results of a calculation.
What is stored:
QualityRules - ruleEngineRunner¶
QualityRules extends the base Quality framework to provide the ability to generate output based on a single SQL rule matching the input data. Effectively an auditable large scale SQL case statement.
Conceptually trigger rules are the when and Output rules are the then ordered by salience.
Example Usage: Derivation Logic.
What is stored:
QualityFolder - ruleFolderRunner¶
QualityFolder extends QualityRules providing the ability to change values of attributes based on any number of SQL rules matching the input data.
Unlike QualityRules which uses salience to select only one Output expression, Folder uses salience to order the execution of all the matching Trigger's paired Output Expressions - folding the results as it goes.
Example Usage: Correction of in-bound data to enable subsequent calculators to process, defaulting etc.
What is stored:
QualityExpressions - ExpressionRunner¶
QualityExpressions extends QualityRules providing the raw results as yaml strings (with type) for expressions and allowing aggregate expressions.
Example Usage: Providing totals or other relevant aggregations over datasets or DQ results - e.g. only deem the data load correct when 90% of the rows have good DQ.
What is stored:
You can also use the typedExpressionRunner, which saves the results of expressions with the same type.
Example Usage: Instead of checking if something exists in a view in a rule, then using the view's value in an Output expression, use typedExpressionRunner to save the lookup value directly. The rule can check if rule_result is null, this can noticeably speed up view heavy queries.
What is stored: For a type of STRUCT
Created: April 22, 2024 08:24:32