Skip to content

Quality - 0.2.0-RC6ΒΆ

Coverage

Statement 88.54 Branch 82.13

Run complex data quality and transformation rules using simple SQL in a batch or streaming Spark application at scale.ΒΆ

Write rules using simple SQL or create re-usable functions via SQL Lambdas.

Your rules are just versioned data, store them wherever convenient, use them by simply defining a column.

  • πŸ†• Spark 4.1 and 4.x Connect Support
  • πŸ†• Folder can use a DefaultProcessor, both Folder and Engine now use the improved collectRunner result processing logic
  • πŸ†• RuleSuiteGroups, manage a single group of rules by name and use it to access ruleSuites in nested runners and group the results
  • πŸ†• Improved compilation performance for large scale RuleSuites by separate compilation
  • πŸ†• Experimental and optional support for optimised large scale rules (>20k RuleSuites) with 2x speed improvements and lower memory requirements via TriggerGrouper

Rules are evaluated lazily during Spark actions, such as writing a row, with results saved in a single predictable column.

Databricks 18 changes it's release process

Per this link Databricks will no longer have minor releases. It is unclear at this time (28.05.2026) how this will be supportable given the high degree of change typically found within a release over time, let alone between minor releases.

Enhanced Spark FunctionalityΒΆ

  • Lambda Functions - user provided re-usable sql functions over late bound columns
  • Map lookup expressions for exact lookups and contains tests, using broadcast variables on Classic and Variables on Connect under the hood they are a great fit for small reference data sets
  • View loading - manage the use of session views in your application through configuration and a pluggable DataFrameLoader

  • Aggregate functions over Maps expandable with simple SQL Lambdas

  • Row ID expressions including guaranteed unique row IDs (based on MAC address guarantees)

  • Fast PRNG's exposing RandomSource allowing pluggable and stable generation across the cluster

Plus a collection of handy functions to integrate it all.


Last update: June 7, 2026 12:43:20
Created: June 7, 2026 12:43:20