Quality - 0.2.0-i129.further_separationΒΆ
Coverage
| Statement | 88.48 | Branch | 81.90 |
Run complex data quality and transformation rules using simple SQL in a batch or streaming Spark application at scale.ΒΆ
Write rules using simple SQL or create re-usable functions via SQL Lambdas.
Your rules are just versioned data, store them wherever convenient, use them by simply defining a column.
Spark 4.1 and 4.x Connect Support
Folder can use a DefaultProcessor, both Folder and Engine now use the improved collectRunner result processing logic
RuleSuiteGroups, manage a single group of rules by name and use it to access ruleSuites in nested runners and group the results
Improved compilation performance for large scale RuleSuites by separate compilation
Experimental and optional support for optimised large scale rules (>20k RuleSuites) with 2x speed improvements and lower memory requirements via TriggerGrouper
Rules are evaluated lazily during Spark actions, such as writing a row, with results saved in a single predictable column.
Enhanced Spark FunctionalityΒΆ
- Lambda Functions - user provided re-usable sql functions over late bound columns
- Map lookup expressions for exact lookups and contains tests, using broadcast variables on Classic and Variables on Connect under the hood they are a great fit for small reference data sets
-
View loading - manage the use of session views in your application through configuration and a pluggable DataFrameLoader
-
Aggregate functions over Maps expandable with simple SQL Lambdas
-
Row ID expressions including guaranteed unique row IDs (based on MAC address guarantees)
-
Fast PRNG's exposing RandomSource allowing pluggable and stable generation across the cluster
Plus a collection of handy functions to integrate it all.
Last update:
May 23, 2026 18:59:09
Created: May 23, 2026 18:59:09
Created: May 23, 2026 18:59:09