Reading & Writing RuleSuites
Reading & Writing RuleSuites¶
Typically you'd save the RuleSuite in configuration tables within a Database or Delta or some other easy to edit store.
// The lambda functions from the RuleSuite
val lambdaDF = toLambdaDS(rules)
lambdaDF.write .....
// The rest of the rules
val ruleDF = toRuleSuiteDF(rules)
ruleDF.write .....
The field names used follow the convention of the default Product Encoder but can be renamed as desired.
Similarly, reading the rules can be as simple as:
val rereadWithoutLambdas = readRulesFromDF(ruleDF,
val reReadLambdas = readLambdasFromDF(lambdaDF.toDF(),
val reReadRuleSuite = integrateLambdas(rereadWithoutLambdas, reReadLambdas)
The column names used during reading are not assumed and must be specified.
Versioned rule datasets¶
The user is completely free to chose their own version management approach, but the design is aimed at immutability and evidencing.
To make things easy a simple scheme with library functions in the simpleVersioning package are provided:
Rules can be added to rulesets (or indeed new rulesets) with just a single row within the input DF, this must increase the RuleSet AND RuleSuites version:
ruleSuiteId ruleSuiteVersion ruleSetId ruleSetVersion ruleId ruleVersion ruleExpr 1 1 1 1 1 1 /* existing rule rows */ true()
1 2 1 2 2 1 /* new rule */ failed()
Similarly, you can change a rule by adding a new row which increments the Rule Id's, RuleSet AND RuleSuites versions:
ruleSuiteId ruleSuiteVersion ruleSetId ruleSetVersion ruleId ruleVersion ruleExpr 1 1 1 1 1 1 /* existing rule row */ true()
1 2 1 2 1 2 /* new version of the above rule */ failed()
To delete a rule you can either use disabled() to flag the rule is inactivated or DELETED to flag the rule to be removed from a RuleSet, as before each version must be incremented:
ruleSuiteId ruleSuiteVersion ruleSetId ruleSetVersion ruleId ruleVersion ruleExpr 1 1 1 1 1 1 /* existing rule row */ true()
1 2 1 2 1 2 DELETED -
OutputExpressions may be re-used with different versions (be it for QualityRules or QualityFolder), each rule row that needs to use a later OutputExpression must increment all of it's Id versions. You may are advised to use lambdas to soften the impact:
ruleSuiteId ruleSuiteVersion ruleSetId ruleSetVersion ruleId ruleVersion ruleExpr ruleEngineSalience ruleEngineId ruleEngineVersion 1 1 1 1 1 1 true()
60 100 1 1 2 1 2 1 2 true()
60 100 2 -
Lambda Expressions for a RuleSuite simply take the latest version for a given lambda id. If you want to delete a lambda (for example you have used a name that is now an official Spark sql function) you can add a DELETED row for a given RuleSuite with a higher version.
ruleSuiteId ruleSuiteVersion name functionId functionVersion ruleExpr 1 1 aToTrue 1 1 /** oops */ a -> a
1 1 always1 2 1 a -> 1
1 2 aToTrue 1 2 /** corrected */ a -> true()
1 2 always1 2 2 DELETED
To use these you replace the above with:
import com.sparkutils.quality._
import simpleVersioning._
val rereadWithoutLambdas = readVersionedRulesFromDF(ruleDF,
val reReadLambdas = readVersionedLambdasFromDF(lambdaDF.toDF(),
val outputExpressions = readVersionedOutputExpressionsFromDF(outputDF,
val rereadWithLambdas = integrateVersionedLambdas(rereadWithoutLambdas, lambdas)
val (reread, missingOutputExpressions) = integrateVersionedOutputExpressions(rereadWithLambdas, outputExpressions)
The "readVersioned" functions modify the dataframe per the above logic to create full sets of ruleSuiteId + ruleSuiteVersion pairs.
The "integrateVersioned" functions will first try the same ruleSuiteId + ruleSuiteVersion pairs and were not present will take the next lowest available version. This runs on the assumption you if didn't need to change any OutputExpressions for a new ruleSuite version why should you need to create fake entries.
Created: January 16, 2025 13:21:08