functions

package functions

Collection of the Quality Spark Expressions for use in select( Column * )

Linear Supertypes

YamlFunctionImports, MapLookupFunctionImports, AggregateFunctionImports, StructFunctionsImport, HashRelatedFunctionImports, LongPairImports, RngFunctionImports, RuleRunnerFunctionImports, PackIdImports, RuleResultImport, StripResultTypesFunction, GenericLongBasedImports, GuaranteedUniqueIDImports, ComparableMapsImports, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

functions
YamlFunctionImports
MapLookupFunctionImports
AggregateFunctionImports
StructFunctionsImport
HashRelatedFunctionImports
LongPairImports
RngFunctionImports
RuleRunnerFunctionImports
PackIdImports
RuleResultImport
StripResultTypesFunction
GenericLongBasedImports
GuaranteedUniqueIDImports
ComparableMapsImports
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Type Members

final class EmptyToGenerateScalaDocs extends AnyRef
Forces scaladoc to generate the package - https://github.com/scala/bug/issues/8124
Forces scaladoc to generate the package - https://github.com/scala/bug/issues/8124
Attributes
protected[quality]

Value Members

def agg_expr(sumType: DataType, filter: Column, sum: SumExpression, result: ResultsExpression): Column
Creates an aggregate by applying filter to rows, calling sum with a starting value (provided by zero) and finally calls result to process the sum and count values for a final result.
Creates an aggregate by applying filter to rows, calling sum with a starting value (provided by zero) and finally calls result to process the sum and count values for a final result.
Note, when working with lambda's in the dsl it's often required to use the dataframes col function as the scope is incorrect in the lambda.
sumType
the type used to sum across rows
filter
filter only input rows interesting to count (similar to CountIf, SumIf)
sum
add to the current sum, takes the current sum as the parameter, sum_with, inc, map_with etc. can be used as implementations
result
processes the sum result and row count (after filtering) to produce the final result of the aggregate
Definition Classes
AggregateFunctionImports
def as_uuid(lower: Column, higher: Column): Column
Converts a lower and higher pair of longs into a uuid string
def comparable_maps(map: Column): Column
Efficiently converts the map column to struct for comparison, unioning, sorting etc.
Efficiently converts the map column to struct for comparison, unioning, sorting etc.
Definition Classes
ComparableMapsImports
val default_rule: Column
The default_rule value
The default_rule value
Definition Classes
RuleRunnerFunctionImports
def digest_to_longs(digestImpl: String, cols: Column*): Column
Converts columns into a digest via the MessageDigest digestImpl
Converts columns into a digest via the MessageDigest digestImpl
returns
array of long
Definition Classes
HashRelatedFunctionImports
def digest_to_longs_struct(digestImpl: String, cols: Column*): Column
Converts columns into a digest via the MessageDigest digestImpl
Converts columns into a digest via the MessageDigest digestImpl
returns
struct with fields i0, i1, i2 etc.
Definition Classes
HashRelatedFunctionImports
val disabled_rule: Column
The disabled_rule value
The disabled_rule value
Definition Classes
RuleRunnerFunctionImports
def drop_field(update: Column, fieldNames: String*): Column
Drops a field from a structure
Drops a field from a structure
fieldNames
may be nested
Definition Classes
StructFunctionsImport
val failed: Column
The failed value
The failed value
Definition Classes
RuleRunnerFunctionImports
def fieldBasedIDF(func: String, prefix: String, digestImpl: String, cols: Column*): Column
Attributes
protected
Definition Classes
HashRelatedFunctionImports
def field_based_id(prefix: String, digestImpl: String, cols: Column*): Column
Creates an id from fields using MessageDigests
Creates an id from fields using MessageDigests
Definition Classes
HashRelatedFunctionImports
def flatten_folder_results(result: Column): Column
Flattens folder results, unpacking the nested structure into a simple relation
Flattens folder results, unpacking the nested structure into a simple relation
Definition Classes
RuleRunnerFunctionImports
def flatten_results(result: Column): Column
Flattens DQ results, unpacking the nested structure into a simple relation
Flattens DQ results, unpacking the nested structure into a simple relation
Definition Classes
RuleRunnerFunctionImports
def flatten_rule_results(result: Column): Column
Flattens rule results, unpacking the nested structure into a simple relation
Flattens rule results, unpacking the nested structure into a simple relation
Definition Classes
RuleRunnerFunctionImports
def from_yaml(yaml: Column, dataType: DataType): Column
Converts yaml expressions to spark native types
Converts yaml expressions to spark native types
dataType
the yaml's data type
Definition Classes
YamlFunctionImports
def group_results(runnerResults: Column, processResult: Option[(Column) => Column] = None): Column
Groups runnerResults which represent the result column from an array of rule runners, expressionRunner excluded, or an array of RuleSuiteGroupResults.
Groups runnerResults which represent the result column from an array of rule runners, expressionRunner excluded, or an array of RuleSuiteGroupResults. Note many of the results will contain nullable fields, see GroupResultsTest for example result types with Scala Encoders. The allowed input types are:
- array(ruleRunner DQ results) which returns RuleSuiteGroupResults - array(ruleEngineResults) which returns either (RuleSuiteGroupResults, array((salientRule, result))) or (RuleSuiteGroupResults, array((salientRule, array((salience,result)))) for debug - array(ruleFolderResults) which returns either (RuleSuiteGroupResults, array(result)) or (RuleSuiteGroupResults, array(array((salience,result))) for debug - array(collectRunner results) which returns (RuleSuiteGroupResults, array(result)), you can also use processResult to flatten - array( group_results( array( group_results( collect_runner ... will return (RuleSuiteGroupResults, array(array(array(X))))
runnerResults
the runner result column
processResult
a lambda to process the result column, not applicable to DQ results, this removes the need for an additional select/projection to process results, ideal for calling flatten on results
Definition Classes
RuleResultImport
def hashF(func: String, digestImpl: String, cols: Column*): Column
Attributes
protected
Definition Classes
HashRelatedFunctionImports
def hash_field_based_id(prefix: String, digestImpl: String, cols: Column*): Column
Creates an id from fields using Guava Hashers
Creates an id from fields using Guava Hashers
Definition Classes
HashRelatedFunctionImports
def hash_with(digestImpl: String, cols: Column*): Column
Converts columns into a digest using Guava Hashers
Converts columns into a digest using Guava Hashers
returns
array of long
Definition Classes
HashRelatedFunctionImports
def hash_with_struct(digestImpl: String, cols: Column*): Column
Converts columns into a digest using Guava Hashers
Converts columns into a digest using Guava Hashers
returns
struct with fields i0, i1, i2 etc.
Definition Classes
HashRelatedFunctionImports
def id_base64(idFields: Column*): Column
Converts either a single ID or individual field parts into base64.
Converts either a single ID or individual field parts into base64. The parts must be provided in the correct order, base, i0, i1.. iN
Definition Classes
GuaranteedUniqueIDImports
def id_equal(aPrefix: String, bPrefix: String): Column
Similar to long_pair_equal but against 160 bit ids.
def id_from_base64(base64: Column, size: Int = 2): Column
Given a base64 string convert to an ID, use id_size to understand how large IDs could be.
Given a base64 string convert to an ID, use id_size to understand how large IDs could be.
size
defaults to 2 (160bit ID)
Definition Classes
GuaranteedUniqueIDImports
def id_raw_type(id: Column): Column
Returns the underlying raw type of an id (base, i0, i1 etc.) without prefixes
Returns the underlying raw type of an id (base, i0, i1 etc.) without prefixes
Definition Classes
GuaranteedUniqueIDImports
def id_size(id: Column): Column
Returns the size of an underlying ID, a unique_id will have 2, other id's may have more, each further increment is another 64bits
Returns the size of an underlying ID, a unique_id will have 2, other id's may have more, each further increment is another 64bits
Definition Classes
GuaranteedUniqueIDImports
val ignored_rule: Column
The ignored_rule value
The ignored_rule value
Definition Classes
RuleRunnerFunctionImports
def inc(incrementWith: Column): SumExpression
Adds incrementWith to the sum value
Adds incrementWith to the sum value
Definition Classes
AggregateFunctionImports
val inc: SumExpression
Adds 1L to the sum value
Adds 1L to the sum value
Definition Classes
AggregateFunctionImports
def long_pair(lower: Column, higher: Column): Column
creates a (lower, higher) struct
creates a (lower, higher) struct
Definition Classes
LongPairImports
def long_pair_equal(aPrefix: String, bPrefix: String): Column
Compares aPrefix_lower = bPrefix_lower and aPrefix_higher = bPrefix_higher
def long_pair_from_uuid(uuid: Column): Column
creates a (lower, higher) struct from a uuid's least and most significant bits
creates a (lower, higher) struct from a uuid's least and most significant bits
Definition Classes
LongPairImports
def map_contains(mapLookupName: String, lookupKey: Column, mapLookups: quality.MapLookups): Column
Tests if there is a stored value from a map via the name mapLookupName and 'key' lookupKey.
Tests if there is a stored value from a map via the name mapLookupName and 'key' lookupKey. Implementation is map_lookup.isNotNull
Definition Classes
MapLookupFunctionImports
def map_lookup(mapLookupName: String, lookupKey: Column, mapLookups: quality.MapLookups): Column
Retrieves the stored value from a map via the name mapLookupName and 'key' lookupKey
Retrieves the stored value from a map via the name mapLookupName and 'key' lookupKey
Definition Classes
MapLookupFunctionImports
def map_with(id: Column, sum: (Column) => Column): SumExpression
Creates an entry in a map sum with id and the result of 'sum' with the previous sum at that id as it's parameter.
Creates an entry in a map sum with id and the result of 'sum' with the previous sum at that id as it's parameter.
sum
the parameter is the previous value of maps' id entry
Definition Classes
AggregateFunctionImports
val meanf: ResultsExpression
Provides the mean (summed value / count of filtered rows)
Provides the mean (summed value / count of filtered rows)
Definition Classes
AggregateFunctionImports
def murmur3ID(prefix: String, child1: Column, restOfchildren: Column*): Column
Definition Classes
GenericLongBasedImports
def murmur3ID(prefix: String, children: Seq[Column]): Column
Murmur3 hash
Murmur3 hash
Definition Classes
GenericLongBasedImports
def pack_ints(id: Id): Column
Packs two integers into a long, typically used for versioned ids.
Packs two integers into a long, typically used for versioned ids.
Definition Classes
PackIdImports
def pack_ints(id: Int, version: Int): Column
Packs two integers into a long, typically used for versioned ids.
Packs two integers into a long, typically used for versioned ids.
Definition Classes
PackIdImports
def pack_ints(id: Column, version: Column): Column
Packs two integers into a long, typically used for versioned ids.
Packs two integers into a long, typically used for versioned ids.
Definition Classes
PackIdImports
val passed: Column
The passed value
The passed value
Definition Classes
RuleRunnerFunctionImports
def prefixed_to_long_pair(source: Column, prefix: String): Column
Converts a prefixed long pair to lower, higher
Converts a prefixed long pair to lower, higher
Definition Classes
LongPairImports
def probability(result: Column): Column
Returns the probability from a given rule result
Returns the probability from a given rule result
Definition Classes
RuleRunnerFunctionImports
def provided_id(prefix: String, child: Column): Column
Creates a hash based ID based on an upstream compatible long generator
Creates a hash based ID based on an upstream compatible long generator
Definition Classes
GenericLongBasedImports
def results_with(result: (Column, Column) => Column): ResultsExpression
Produces an aggregate result
Produces an aggregate result
result
the sum and count are parameters
Definition Classes
AggregateFunctionImports
val return_both: ResultsExpression
returns both the count and sum
returns both the count and sum
Definition Classes
AggregateFunctionImports
val return_sum: ResultsExpression
returns the sum, ignoring the count
returns the sum, ignoring the count
Definition Classes
AggregateFunctionImports
def reverse_comparable_maps(mapStruct: Column): Column
Efficiently converts the mapStruct column to it's original Map type
Efficiently converts the mapStruct column to it's original Map type
Definition Classes
ComparableMapsImports
def rngID(prefix: String): Column
Creates a default randomRNG based on RandomSource.XO_RO_SHI_RO_128_PP
Creates a default randomRNG based on RandomSource.XO_RO_SHI_RO_128_PP
Definition Classes
GenericLongBasedImports
def rng_bytes(randomSource: RandomSource = RandomSource.XO_RO_SHI_RO_128_PP, numBytes: Int = 16, seed: Long = 0): Column
Creates a random number generator using a given commons-rng source
Creates a random number generator using a given commons-rng source
randomSource
commons-rng random source
numBytes
the number of bytes to produce in the array, defaulting to 16
seed
the seed to use / mixin
returns
a column with the appropriate rng defined
Definition Classes
RngFunctionImports
def rng_id(prefix: String, randomSource: RandomSource, seed: Long = 0L): Column
Creates a randomRNG ID based on randomSource with a given seed
Creates a randomRNG ID based on randomSource with a given seed
Definition Classes
GenericLongBasedImports
def rng_uuid(column: Column): Column
Creates a uuid from byte arrays or two longs, use with the rng() function to generate random uuids.
Creates a uuid from byte arrays or two longs, use with the rng() function to generate random uuids.
Definition Classes
RngFunctionImports
def rule_result(ruleSuiteResults: Column, ruleSuiteId: Int, ruleSuiteVersion: Int, ruleSetId: Int, ruleSetVersion: Int, ruleId: Int, ruleVersion: Int): Column
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type.
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type. Integer is returned for DQ checks and either String or (ruleResult: String, resultDDL: String) for ExpressionResults.
Definition Classes
RuleResultImport
def rule_result(ruleSuiteResults: Column, ruleSuiteId: Column, ruleSuiteVersion: Column, ruleSetId: Column, ruleSetVersion: Column, ruleId: Column, ruleVersion: Column): Column
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type.
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type. Integer is returned for DQ checks and either String or (ruleResult: String, resultDDL: String) for ExpressionResults.
Definition Classes
RuleResultImport
def rule_result(ruleSuiteResults: Column, ruleSuiteId: Column, ruleSetId: Column, ruleId: Column): Column
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type.
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type. Integer is returned for DQ checks and either String or (ruleResult: String, resultDDL: String) for ExpressionResults.
Definition Classes
RuleResultImport
def rule_suite_result_details(result: Column): Column
Consumes a RuleSuiteResult and returns RuleSuiteDetails
Consumes a RuleSuiteResult and returns RuleSuiteDetails
Definition Classes
RuleRunnerFunctionImports
def rule_suite_statistics(results: Column): Column
Aggregates over RuleSuiteResults columns and returns a RuleSuiteGroupStatistics row
Aggregates over RuleSuiteResults columns and returns a RuleSuiteGroupStatistics row
Definition Classes
AggregateFunctionImports
val soft_failed: Column
The soft_failed value
The soft_failed value
Definition Classes
RuleRunnerFunctionImports
def strip_result_ddl(expressionResults: Column): Column
Stores only the ruleResult, removing the structure including the resultDDL column
Stores only the ruleResult, removing the structure including the resultDDL column
Definition Classes
StripResultTypesFunction
def sum_with(sum: (Column) => Column): SumExpression
Given the current sum, produce the next sum, for example by incrementing 1 on the sum to count filtered rows
Given the current sum, produce the next sum, for example by incrementing 1 on the sum to count filtered rows
Definition Classes
AggregateFunctionImports
def to_yaml(col: Column, renderOptions: Map[String, String] = Map.empty): Column
Converts spark expressions to yaml using snakeyml.
Converts spark expressions to yaml using snakeyml.
Definition Classes
YamlFunctionImports
def unify_result(runnerResults: Column): Column
Converts non-debug engine results into a single result column, allowing engine, collector and folder results to be nested and combined via group_results within the same nested rule suite calls.
Converts non-debug engine results into a single result column, allowing engine, collector and folder results to be nested and combined via group_results within the same nested rule suite calls. Similarly, non-debug RuleSuiteGroupResults can be converted.
Definition Classes
RuleResultImport
def unique_id(prefix: String): Column
Creates a uniqueID backed by the GuaranteedUniqueID Spark Snowflake ID approach
Creates a uniqueID backed by the GuaranteedUniqueID Spark Snowflake ID approach
Definition Classes
GuaranteedUniqueIDImports
def unpack(packedId: Column): Column
Takes a packedId long and unpacks to id, version
Takes a packedId long and unpacks to id, version
Definition Classes
PackIdImports
def unpack_id_triple(idTriple: Column): Column
Unpacks an IdTriple column into it's six constituent integers
Unpacks an IdTriple column into it's six constituent integers
Definition Classes
PackIdImports
def update_field(update: Column, transformations: (String, Column)*): Column
Adds fields, in order, for each field path it's paired transformation is applied to the update column
Adds fields, in order, for each field path it's paired transformation is applied to the update column
returns
a new copy of update with the changes applied
Definition Classes
StructFunctionsImport
def za_field_based_id(prefix: String, digestImpl: String, cols: Column*): Column
Creates an id from fields using ZeroAllocation LongHashFactory (64bit)
Creates an id from fields using ZeroAllocation LongHashFactory (64bit)
Definition Classes
HashRelatedFunctionImports
def za_hash_longs_with(digestImpl: String, cols: Column*): Column
Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)
Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)
returns
array of long
Definition Classes
HashRelatedFunctionImports
def za_hash_longs_with_struct(digestImpl: String, cols: Column*): Column
Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)
Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)
returns
struct with fields i0, i1, i2 etc.
Definition Classes
HashRelatedFunctionImports
def za_hash_with(digestImpl: String, cols: Column*): Column
Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)
Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)
returns
array of long
Definition Classes
HashRelatedFunctionImports
def za_hash_with_struct(digestImpl: String, cols: Column*): Column
Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)
Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)
returns
struct with fields i0, i1, i2 etc.
Definition Classes
HashRelatedFunctionImports
def za_longs_field_based_id(prefix: String, digestImpl: String, cols: Column*): Column
Creates an id from fields using ZeroAllocation LongTuple Factory (128-bit)
Creates an id from fields using ZeroAllocation LongTuple Factory (128-bit)
Definition Classes
HashRelatedFunctionImports

Deprecated Value Members

def fieldBasedID(prefix: String, digestImpl: String, children: Column*): Column
Creates an id from fields using MessageDigests, in line with SQL naming please use field_based_id
Creates an id from fields using MessageDigests, in line with SQL naming please use field_based_id
Definition Classes
HashRelatedFunctionImports
Annotations
@deprecated
Deprecated
(Since version 0.1.0) migrate to field_based_id
def providedID(prefix: String, child: Column): Column
Creates a hash based ID based on an upstream compatible long generator, in line with sql functions please migrate to provided_id
Creates a hash based ID based on an upstream compatible long generator, in line with sql functions please migrate to provided_id
Definition Classes
GenericLongBasedImports
Annotations
@deprecated
Deprecated
(Since version 0.1.0) migrate to provided_id, providedID will be removed in 0.3.0
def rule_suite_statistics_aggregator(results: Column): Column
Prefer rule_suite_statistics, this function is only provided as a fallback should there be issues in the rule_suite_statistics implementation, which is over 6% faster.
Prefer rule_suite_statistics, this function is only provided as a fallback should there be issues in the rule_suite_statistics implementation, which is over 6% faster.
Aggregates over RuleSuiteResults columns and returns a RuleSuiteGroupStatistics row using a pre Spark 4 unified API Aggregator, per https://github.com/sparkutils/quality/issues/117 this does not run on Databricks Shared Clusters. *
Definition Classes
AggregateFunctionImports
Annotations
@deprecated
Deprecated
(Since version 0.2.0) This aggregation implementation will be removed in 0.3.0 and should only be used if rule_suite_statistics has issues

Packages

functions

package functions

Type Members

Value Members

Deprecated Value Members

Inherited from YamlFunctionImports

Inherited from MapLookupFunctionImports

Inherited from AggregateFunctionImports

Inherited from StructFunctionsImport

Inherited from HashRelatedFunctionImports

Inherited from LongPairImports

Inherited from RngFunctionImports

Inherited from RuleRunnerFunctionImports

Inherited from PackIdImports

Inherited from RuleResultImport

Inherited from StripResultTypesFunction

Inherited from GenericLongBasedImports

Inherited from GuaranteedUniqueIDImports

Inherited from ComparableMapsImports

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

functions

package functions

Type Members

Value Members

Deprecated Value Members

Inherited from YamlFunctionImports

Inherited from MapLookupFunctionImports

Inherited from AggregateFunctionImports

Inherited from StructFunctionsImport

Inherited from HashRelatedFunctionImports

Inherited from LongPairImports

Inherited from RngFunctionImports

Inherited from RuleRunnerFunctionImports

Inherited from PackIdImports

Inherited from RuleResultImport

Inherited from StripResultTypesFunction

Inherited from GenericLongBasedImports

Inherited from GuaranteedUniqueIDImports

Inherited from ComparableMapsImports

Inherited from AnyRef

Inherited from Any

Ungrouped

functions