Quality - 0.2.0-preview1.1¶

Coverage

Statement

88.87

Branch

82.32

Run complex data quality and transformation rules using simple SQL in a batch or streaming Spark application at scale.¶

Write rules using simple SQL or create re-usable functions via SQL Lambdas.

Your rules are just versioned data, store them wherever convenient, use them by simply defining a column.

Spark 4 Connect Support
Folder can use a DefaultProcessor, both Folder and Engine now use the improved collectRunner result processing logic
RuleSuiteGroups, manage a single group of rules by name and use it to access ruleSuites in nested runners and group the results

Rules are evaluated lazily during Spark actions, such as writing a row, with results saved in a single predictable column.

Lambda Functions - user provided re-usable sql functions over late bound columns
Map lookup expressions for exact lookups and contains tests, using broadcast variables on Classic and Variables on Connect under the hood they are a great fit for small reference data sets
View loading - manage the use of session views in your application through configuration and a pluggable DataFrameLoader
Aggregate functions over Maps expandable with simple SQL Lambdas
Row ID expressions including guaranteed unique row IDs (based on MAC address guarantees)
Fast PRNG's exposing RandomSource allowing pluggable and stable generation across the cluster
Support for massive Bloom Filters while retaining FPP (i.e. several billion items at 0.001 would not fit into a normal 2gb byte array) on Spark Classic

Plus a collection of handy functions to integrate it all.

Last update: March 8, 2026 14:46:46
Created: March 8, 2026 14:46:46