package functions
Collection of the Quality Spark Expressions for use in select( Column * )
- Alphabetic
- By Inheritance
- functions
- YamlFunctionImports
- MapLookupFunctionImports
- AggregateFunctionImports
- StructFunctionsImport
- HashRelatedFunctionImports
- LongPairImports
- RngFunctionImports
- RuleRunnerFunctionImports
- PackIdImports
- RuleResultImport
- StripResultTypesFunction
- GenericLongBasedImports
- GuaranteedUniqueIDImports
- ComparableMapsImports
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- final class EmptyToGenerateScalaDocs extends AnyRef
Forces scaladoc to generate the package - https://github.com/scala/bug/issues/8124
Forces scaladoc to generate the package - https://github.com/scala/bug/issues/8124
- Attributes
- protected[quality]
Value Members
- def agg_expr(sumType: DataType, filter: Column, sum: SumExpression, result: ResultsExpression): Column
Creates an aggregate by applying filter to rows, calling sum with a starting value (provided by zero) and finally calls result to process the sum and count values for a final result.
Creates an aggregate by applying filter to rows, calling sum with a starting value (provided by zero) and finally calls result to process the sum and count values for a final result.
Note, when working with lambda's in the dsl it's often required to use the dataframes col function as the scope is incorrect in the lambda.
- sumType
the type used to sum across rows
- filter
filter only input rows interesting to count (similar to CountIf, SumIf)
- sum
add to the current sum, takes the current sum as the parameter, sum_with, inc, map_with etc. can be used as implementations
- result
processes the sum result and row count (after filtering) to produce the final result of the aggregate
- Definition Classes
- AggregateFunctionImports
- def as_uuid(lower: Column, higher: Column): Column
Converts a lower and higher pair of longs into a uuid string
- def comparable_maps(map: Column): Column
Efficiently converts the map column to struct for comparison, unioning, sorting etc.
Efficiently converts the map column to struct for comparison, unioning, sorting etc.
- Definition Classes
- ComparableMapsImports
- val default_rule: Column
The default_rule value
The default_rule value
- Definition Classes
- RuleRunnerFunctionImports
- def digest_to_longs(digestImpl: String, cols: Column*): Column
Converts columns into a digest via the MessageDigest digestImpl
Converts columns into a digest via the MessageDigest digestImpl
- returns
array of long
- Definition Classes
- HashRelatedFunctionImports
- def digest_to_longs_struct(digestImpl: String, cols: Column*): Column
Converts columns into a digest via the MessageDigest digestImpl
Converts columns into a digest via the MessageDigest digestImpl
- returns
struct with fields i0, i1, i2 etc.
- Definition Classes
- HashRelatedFunctionImports
- val disabled_rule: Column
The disabled_rule value
The disabled_rule value
- Definition Classes
- RuleRunnerFunctionImports
- def drop_field(update: Column, fieldNames: String*): Column
Drops a field from a structure
- val failed: Column
The failed value
The failed value
- Definition Classes
- RuleRunnerFunctionImports
- def fieldBasedIDF(func: String, prefix: String, digestImpl: String, cols: Column*): Column
- Attributes
- protected
- Definition Classes
- HashRelatedFunctionImports
- def field_based_id(prefix: String, digestImpl: String, cols: Column*): Column
Creates an id from fields using MessageDigests
Creates an id from fields using MessageDigests
- Definition Classes
- HashRelatedFunctionImports
- def flatten_folder_results(result: Column): Column
Flattens folder results, unpacking the nested structure into a simple relation
Flattens folder results, unpacking the nested structure into a simple relation
- Definition Classes
- RuleRunnerFunctionImports
- def flatten_results(result: Column): Column
Flattens DQ results, unpacking the nested structure into a simple relation
Flattens DQ results, unpacking the nested structure into a simple relation
- Definition Classes
- RuleRunnerFunctionImports
- def flatten_rule_results(result: Column): Column
Flattens rule results, unpacking the nested structure into a simple relation
Flattens rule results, unpacking the nested structure into a simple relation
- Definition Classes
- RuleRunnerFunctionImports
- def from_yaml(yaml: Column, dataType: DataType): Column
Converts yaml expressions to spark native types
Converts yaml expressions to spark native types
- dataType
the yaml's data type
- Definition Classes
- YamlFunctionImports
- def group_results(runnerResults: Column, processResult: Option[(Column) => Column] = None): Column
Groups runnerResults which represent the result column from an array of rule runners, expressionRunner excluded, or an array of RuleSuiteGroupResults.
Groups runnerResults which represent the result column from an array of rule runners, expressionRunner excluded, or an array of RuleSuiteGroupResults. Note many of the results will contain nullable fields, see GroupResultsTest for example result types with Scala Encoders. The allowed input types are:
- array(ruleRunner DQ results) which returns RuleSuiteGroupResults - array(ruleEngineResults) which returns either (RuleSuiteGroupResults, array((salientRule, result))) or (RuleSuiteGroupResults, array((salientRule, array((salience,result)))) for debug - array(ruleFolderResults) which returns either (RuleSuiteGroupResults, array(result)) or (RuleSuiteGroupResults, array(array((salience,result))) for debug - array(collectRunner results) which returns (RuleSuiteGroupResults, array(result)), you can also use processResult to flatten - array( group_results( array( group_results( collect_runner ... will return (RuleSuiteGroupResults, array(array(array(X))))
- runnerResults
the runner result column
- processResult
a lambda to process the result column, not applicable to DQ results, this removes the need for an additional select/projection to process results, ideal for calling flatten on results
- Definition Classes
- RuleResultImport
- def hashF(func: String, digestImpl: String, cols: Column*): Column
- Attributes
- protected
- Definition Classes
- HashRelatedFunctionImports
- def hash_field_based_id(prefix: String, digestImpl: String, cols: Column*): Column
Creates an id from fields using Guava Hashers
Creates an id from fields using Guava Hashers
- Definition Classes
- HashRelatedFunctionImports
- def hash_with(digestImpl: String, cols: Column*): Column
Converts columns into a digest using Guava Hashers
Converts columns into a digest using Guava Hashers
- returns
array of long
- Definition Classes
- HashRelatedFunctionImports
- def hash_with_struct(digestImpl: String, cols: Column*): Column
Converts columns into a digest using Guava Hashers
Converts columns into a digest using Guava Hashers
- returns
struct with fields i0, i1, i2 etc.
- Definition Classes
- HashRelatedFunctionImports
- def id_base64(idFields: Column*): Column
Converts either a single ID or individual field parts into base64.
Converts either a single ID or individual field parts into base64. The parts must be provided in the correct order, base, i0, i1.. iN
- Definition Classes
- GuaranteedUniqueIDImports
- def id_equal(aPrefix: String, bPrefix: String): Column
Similar to long_pair_equal but against 160 bit ids.
- def id_from_base64(base64: Column, size: Int = 2): Column
Given a base64 string convert to an ID, use id_size to understand how large IDs could be.
Given a base64 string convert to an ID, use id_size to understand how large IDs could be.
- size
defaults to 2 (160bit ID)
- Definition Classes
- GuaranteedUniqueIDImports
- def id_raw_type(id: Column): Column
Returns the underlying raw type of an id (base, i0, i1 etc.) without prefixes
Returns the underlying raw type of an id (base, i0, i1 etc.) without prefixes
- Definition Classes
- GuaranteedUniqueIDImports
- def id_size(id: Column): Column
Returns the size of an underlying ID, a unique_id will have 2, other id's may have more, each further increment is another 64bits
Returns the size of an underlying ID, a unique_id will have 2, other id's may have more, each further increment is another 64bits
- Definition Classes
- GuaranteedUniqueIDImports
- val ignored_rule: Column
The ignored_rule value
The ignored_rule value
- Definition Classes
- RuleRunnerFunctionImports
- def inc(incrementWith: Column): SumExpression
Adds incrementWith to the sum value
Adds incrementWith to the sum value
- Definition Classes
- AggregateFunctionImports
- val inc: SumExpression
Adds 1L to the sum value
Adds 1L to the sum value
- Definition Classes
- AggregateFunctionImports
- def long_pair(lower: Column, higher: Column): Column
creates a (lower, higher) struct
creates a (lower, higher) struct
- Definition Classes
- LongPairImports
- def long_pair_equal(aPrefix: String, bPrefix: String): Column
Compares aPrefix_lower = bPrefix_lower and aPrefix_higher = bPrefix_higher
- def long_pair_from_uuid(uuid: Column): Column
creates a (lower, higher) struct from a uuid's least and most significant bits
creates a (lower, higher) struct from a uuid's least and most significant bits
- Definition Classes
- LongPairImports
- def map_contains(mapLookupName: String, lookupKey: Column, mapLookups: quality.MapLookups): Column
Tests if there is a stored value from a map via the name mapLookupName and 'key' lookupKey.
Tests if there is a stored value from a map via the name mapLookupName and 'key' lookupKey. Implementation is map_lookup.isNotNull
- Definition Classes
- MapLookupFunctionImports
- def map_lookup(mapLookupName: String, lookupKey: Column, mapLookups: quality.MapLookups): Column
Retrieves the stored value from a map via the name mapLookupName and 'key' lookupKey
Retrieves the stored value from a map via the name mapLookupName and 'key' lookupKey
- Definition Classes
- MapLookupFunctionImports
- def map_with(id: Column, sum: (Column) => Column): SumExpression
Creates an entry in a map sum with id and the result of 'sum' with the previous sum at that id as it's parameter.
Creates an entry in a map sum with id and the result of 'sum' with the previous sum at that id as it's parameter.
- sum
the parameter is the previous value of maps' id entry
- Definition Classes
- AggregateFunctionImports
- val meanf: ResultsExpression
Provides the mean (summed value / count of filtered rows)
Provides the mean (summed value / count of filtered rows)
- Definition Classes
- AggregateFunctionImports
- def murmur3ID(prefix: String, child1: Column, restOfchildren: Column*): Column
- Definition Classes
- GenericLongBasedImports
- def murmur3ID(prefix: String, children: Seq[Column]): Column
Murmur3 hash
Murmur3 hash
- Definition Classes
- GenericLongBasedImports
- def pack_ints(id: Id): Column
Packs two integers into a long, typically used for versioned ids.
Packs two integers into a long, typically used for versioned ids.
- Definition Classes
- PackIdImports
- def pack_ints(id: Int, version: Int): Column
Packs two integers into a long, typically used for versioned ids.
Packs two integers into a long, typically used for versioned ids.
- Definition Classes
- PackIdImports
- def pack_ints(id: Column, version: Column): Column
Packs two integers into a long, typically used for versioned ids.
Packs two integers into a long, typically used for versioned ids.
- Definition Classes
- PackIdImports
- val passed: Column
The passed value
The passed value
- Definition Classes
- RuleRunnerFunctionImports
- def prefixed_to_long_pair(source: Column, prefix: String): Column
Converts a prefixed long pair to lower, higher
Converts a prefixed long pair to lower, higher
- Definition Classes
- LongPairImports
- def probability(result: Column): Column
Returns the probability from a given rule result
Returns the probability from a given rule result
- Definition Classes
- RuleRunnerFunctionImports
- def provided_id(prefix: String, child: Column): Column
Creates a hash based ID based on an upstream compatible long generator
Creates a hash based ID based on an upstream compatible long generator
- Definition Classes
- GenericLongBasedImports
- def results_with(result: (Column, Column) => Column): ResultsExpression
Produces an aggregate result
Produces an aggregate result
- result
the sum and count are parameters
- Definition Classes
- AggregateFunctionImports
- val return_both: ResultsExpression
returns both the count and sum
returns both the count and sum
- Definition Classes
- AggregateFunctionImports
- val return_sum: ResultsExpression
returns the sum, ignoring the count
returns the sum, ignoring the count
- Definition Classes
- AggregateFunctionImports
- def reverse_comparable_maps(mapStruct: Column): Column
Efficiently converts the mapStruct column to it's original Map type
Efficiently converts the mapStruct column to it's original Map type
- Definition Classes
- ComparableMapsImports
- def rngID(prefix: String): Column
Creates a default randomRNG based on RandomSource.XO_RO_SHI_RO_128_PP
Creates a default randomRNG based on RandomSource.XO_RO_SHI_RO_128_PP
- Definition Classes
- GenericLongBasedImports
- def rng_bytes(randomSource: RandomSource = RandomSource.XO_RO_SHI_RO_128_PP, numBytes: Int = 16, seed: Long = 0): Column
Creates a random number generator using a given commons-rng source
Creates a random number generator using a given commons-rng source
- randomSource
commons-rng random source
- numBytes
the number of bytes to produce in the array, defaulting to 16
- seed
the seed to use / mixin
- returns
a column with the appropriate rng defined
- Definition Classes
- RngFunctionImports
- def rng_id(prefix: String, randomSource: RandomSource, seed: Long = 0L): Column
Creates a randomRNG ID based on randomSource with a given seed
Creates a randomRNG ID based on randomSource with a given seed
- Definition Classes
- GenericLongBasedImports
- def rng_uuid(column: Column): Column
Creates a uuid from byte arrays or two longs, use with the rng() function to generate random uuids.
Creates a uuid from byte arrays or two longs, use with the rng() function to generate random uuids.
- Definition Classes
- RngFunctionImports
- def rule_result(ruleSuiteResults: Column, ruleSuiteId: Int, ruleSuiteVersion: Int, ruleSetId: Int, ruleSetVersion: Int, ruleId: Int, ruleVersion: Int): Column
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type.
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type. Integer is returned for DQ checks and either String or (ruleResult: String, resultDDL: String) for ExpressionResults.
- Definition Classes
- RuleResultImport
- def rule_result(ruleSuiteResults: Column, ruleSuiteId: Column, ruleSuiteVersion: Column, ruleSetId: Column, ruleSetVersion: Column, ruleId: Column, ruleVersion: Column): Column
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type.
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type. Integer is returned for DQ checks and either String or (ruleResult: String, resultDDL: String) for ExpressionResults.
- Definition Classes
- RuleResultImport
- def rule_result(ruleSuiteResults: Column, ruleSuiteId: Column, ruleSetId: Column, ruleId: Column): Column
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type.
Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type. Integer is returned for DQ checks and either String or (ruleResult: String, resultDDL: String) for ExpressionResults.
- Definition Classes
- RuleResultImport
- def rule_suite_result_details(result: Column): Column
Consumes a RuleSuiteResult and returns RuleSuiteDetails
Consumes a RuleSuiteResult and returns RuleSuiteDetails
- Definition Classes
- RuleRunnerFunctionImports
- def rule_suite_statistics(results: Column): Column
Aggregates over RuleSuiteResults columns and returns a RuleSuiteGroupStatistics row
Aggregates over RuleSuiteResults columns and returns a RuleSuiteGroupStatistics row
- Definition Classes
- AggregateFunctionImports
- val soft_failed: Column
The soft_failed value
The soft_failed value
- Definition Classes
- RuleRunnerFunctionImports
- def strip_result_ddl(expressionResults: Column): Column
Stores only the ruleResult, removing the structure including the resultDDL column
Stores only the ruleResult, removing the structure including the resultDDL column
- Definition Classes
- StripResultTypesFunction
- def sum_with(sum: (Column) => Column): SumExpression
Given the current sum, produce the next sum, for example by incrementing 1 on the sum to count filtered rows
Given the current sum, produce the next sum, for example by incrementing 1 on the sum to count filtered rows
- Definition Classes
- AggregateFunctionImports
- def to_yaml(col: Column, renderOptions: Map[String, String] = Map.empty): Column
Converts spark expressions to yaml using snakeyml.
Converts spark expressions to yaml using snakeyml.
- Definition Classes
- YamlFunctionImports
- def unify_result(runnerResults: Column): Column
Converts non-debug engine results into a single result column, allowing engine, collector and folder results to be nested and combined via group_results within the same nested rule suite calls.
Converts non-debug engine results into a single result column, allowing engine, collector and folder results to be nested and combined via group_results within the same nested rule suite calls. Similarly, non-debug RuleSuiteGroupResults can be converted.
- Definition Classes
- RuleResultImport
- def unique_id(prefix: String): Column
Creates a uniqueID backed by the GuaranteedUniqueID Spark Snowflake ID approach
Creates a uniqueID backed by the GuaranteedUniqueID Spark Snowflake ID approach
- Definition Classes
- GuaranteedUniqueIDImports
- def unpack(packedId: Column): Column
Takes a packedId long and unpacks to id, version
Takes a packedId long and unpacks to id, version
- Definition Classes
- PackIdImports
- def unpack_id_triple(idTriple: Column): Column
Unpacks an IdTriple column into it's six constituent integers
Unpacks an IdTriple column into it's six constituent integers
- Definition Classes
- PackIdImports
- def update_field(update: Column, transformations: (String, Column)*): Column
Adds fields, in order, for each field path it's paired transformation is applied to the update column
Adds fields, in order, for each field path it's paired transformation is applied to the update column
- returns
a new copy of update with the changes applied
- Definition Classes
- StructFunctionsImport
- def za_field_based_id(prefix: String, digestImpl: String, cols: Column*): Column
Creates an id from fields using ZeroAllocation LongHashFactory (64bit)
Creates an id from fields using ZeroAllocation LongHashFactory (64bit)
- Definition Classes
- HashRelatedFunctionImports
- def za_hash_longs_with(digestImpl: String, cols: Column*): Column
Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)
Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)
- returns
array of long
- Definition Classes
- HashRelatedFunctionImports
- def za_hash_longs_with_struct(digestImpl: String, cols: Column*): Column
Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)
Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)
- returns
struct with fields i0, i1, i2 etc.
- Definition Classes
- HashRelatedFunctionImports
- def za_hash_with(digestImpl: String, cols: Column*): Column
Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)
Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)
- returns
array of long
- Definition Classes
- HashRelatedFunctionImports
- def za_hash_with_struct(digestImpl: String, cols: Column*): Column
Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)
Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)
- returns
struct with fields i0, i1, i2 etc.
- Definition Classes
- HashRelatedFunctionImports
- def za_longs_field_based_id(prefix: String, digestImpl: String, cols: Column*): Column
Creates an id from fields using ZeroAllocation LongTuple Factory (128-bit)
Creates an id from fields using ZeroAllocation LongTuple Factory (128-bit)
- Definition Classes
- HashRelatedFunctionImports
Deprecated Value Members
- def fieldBasedID(prefix: String, digestImpl: String, children: Column*): Column
Creates an id from fields using MessageDigests, in line with SQL naming please use field_based_id
Creates an id from fields using MessageDigests, in line with SQL naming please use field_based_id
- Definition Classes
- HashRelatedFunctionImports
- Annotations
- @deprecated
- Deprecated
(Since version 0.1.0) migrate to field_based_id
- def providedID(prefix: String, child: Column): Column
Creates a hash based ID based on an upstream compatible long generator, in line with sql functions please migrate to provided_id
Creates a hash based ID based on an upstream compatible long generator, in line with sql functions please migrate to provided_id
- Definition Classes
- GenericLongBasedImports
- Annotations
- @deprecated
- Deprecated
(Since version 0.1.0) migrate to provided_id, providedID will be removed in 0.3.0
- def rule_suite_statistics_aggregator(results: Column): Column
Prefer rule_suite_statistics, this function is only provided as a fallback should there be issues in the rule_suite_statistics implementation, which is over 6% faster.
Prefer rule_suite_statistics, this function is only provided as a fallback should there be issues in the rule_suite_statistics implementation, which is over 6% faster.
Aggregates over RuleSuiteResults columns and returns a RuleSuiteGroupStatistics row using a pre Spark 4 unified API Aggregator, per https://github.com/sparkutils/quality/issues/117 this does not run on Databricks Shared Clusters. *
- Definition Classes
- AggregateFunctionImports
- Annotations
- @deprecated
- Deprecated
(Since version 0.2.0) This aggregation implementation will be removed in 0.3.0 and should only be used if rule_suite_statistics has issues