Skip to content

Quality - 0.2.0-i129.further_separationΒΆ

Coverage

Statement 88.48 Branch 81.90

Run complex data quality and transformation rules using simple SQL in a batch or streaming Spark application at scale.ΒΆ

Write rules using simple SQL or create re-usable functions via SQL Lambdas.

Your rules are just versioned data, store them wherever convenient, use them by simply defining a column.

  • πŸ†• Spark 4.1 and 4.x Connect Support
  • πŸ†• Folder can use a DefaultProcessor, both Folder and Engine now use the improved collectRunner result processing logic
  • πŸ†• RuleSuiteGroups, manage a single group of rules by name and use it to access ruleSuites in nested runners and group the results
  • πŸ†• Improved compilation performance for large scale RuleSuites by separate compilation
  • πŸ†• Experimental and optional support for optimised large scale rules (>20k RuleSuites) with 2x speed improvements and lower memory requirements via TriggerGrouper

Rules are evaluated lazily during Spark actions, such as writing a row, with results saved in a single predictable column.

Enhanced Spark FunctionalityΒΆ

  • Lambda Functions - user provided re-usable sql functions over late bound columns
  • Map lookup expressions for exact lookups and contains tests, using broadcast variables on Classic and Variables on Connect under the hood they are a great fit for small reference data sets
  • View loading - manage the use of session views in your application through configuration and a pluggable DataFrameLoader

  • Aggregate functions over Maps expandable with simple SQL Lambdas

  • Row ID expressions including guaranteed unique row IDs (based on MAC address guarantees)

  • Fast PRNG's exposing RandomSource allowing pluggable and stable generation across the cluster

Plus a collection of handy functions to integrate it all.


Last update: May 23, 2026 18:59:09
Created: May 23, 2026 18:59:09