Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package sparkutils
    Definition Classes
    com
  • package quality

    Provides an easy import point for the library.

    Provides an easy import point for the library.

    Definition Classes
    sparkutils
  • package functions

    Collection of the Quality Spark Expressions for use in select( Column * )

    Collection of the Quality Spark Expressions for use in select( Column * )

    Definition Classes
    quality
  • EmptyToGenerateScalaDocs
  • package impl
    Definition Classes
    quality
  • package implicits

    Imports the common implicits

    Imports the common implicits

    Definition Classes
    quality
  • package simpleVersioning

    A simple versioning scheme that allows management of versions

    A simple versioning scheme that allows management of versions

    Definition Classes
    quality

package functions

Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. functions
  2. YamlFunctionImports
  3. MapLookupFunctionImports
  4. AggregateFunctionImports
  5. StructFunctionsImport
  6. HashRelatedFunctionImports
  7. LongPairImports
  8. RngFunctionImports
  9. RuleRunnerFunctionImports
  10. PackIdImports
  11. RuleResultImport
  12. StripResultTypesFunction
  13. GenericLongBasedImports
  14. GuaranteedUniqueIDImports
  15. ComparableMapsImports
  16. AnyRef
  17. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Type Members

  1. final class EmptyToGenerateScalaDocs extends AnyRef

    Forces scaladoc to generate the package - https://github.com/scala/bug/issues/8124

    Forces scaladoc to generate the package - https://github.com/scala/bug/issues/8124

    Attributes
    protected[quality]

Value Members

  1. def agg_expr(sumType: DataType, filter: Column, sum: SumExpression, result: ResultsExpression): Column

    Creates an aggregate by applying filter to rows, calling sum with a starting value (provided by zero) and finally calls result to process the sum and count values for a final result.

    Creates an aggregate by applying filter to rows, calling sum with a starting value (provided by zero) and finally calls result to process the sum and count values for a final result.

    Note, when working with lambda's in the dsl it's often required to use the dataframes col function as the scope is incorrect in the lambda.

    sumType

    the type used to sum across rows

    filter

    filter only input rows interesting to count (similar to CountIf, SumIf)

    sum

    add to the current sum, takes the current sum as the parameter, sum_with, inc, map_with etc. can be used as implementations

    result

    processes the sum result and row count (after filtering) to produce the final result of the aggregate

    Definition Classes
    AggregateFunctionImports
  2. def as_uuid(lower: Column, higher: Column): Column

    Converts a lower and higher pair of longs into a uuid string

  3. def comparable_maps(map: Column): Column

    Efficiently converts the map column to struct for comparison, unioning, sorting etc.

    Efficiently converts the map column to struct for comparison, unioning, sorting etc.

    Definition Classes
    ComparableMapsImports
  4. val default_rule: Column

    The default_rule value

    The default_rule value

    Definition Classes
    RuleRunnerFunctionImports
  5. def digest_to_longs(digestImpl: String, cols: Column*): Column

    Converts columns into a digest via the MessageDigest digestImpl

    Converts columns into a digest via the MessageDigest digestImpl

    returns

    array of long

    Definition Classes
    HashRelatedFunctionImports
  6. def digest_to_longs_struct(digestImpl: String, cols: Column*): Column

    Converts columns into a digest via the MessageDigest digestImpl

    Converts columns into a digest via the MessageDigest digestImpl

    returns

    struct with fields i0, i1, i2 etc.

    Definition Classes
    HashRelatedFunctionImports
  7. val disabled_rule: Column

    The disabled_rule value

    The disabled_rule value

    Definition Classes
    RuleRunnerFunctionImports
  8. def drop_field(update: Column, fieldNames: String*): Column

    Drops a field from a structure

    Drops a field from a structure

    fieldNames

    may be nested

    Definition Classes
    StructFunctionsImport
  9. val failed: Column

    The failed value

    The failed value

    Definition Classes
    RuleRunnerFunctionImports
  10. def fieldBasedIDF(func: String, prefix: String, digestImpl: String, cols: Column*): Column
    Attributes
    protected
    Definition Classes
    HashRelatedFunctionImports
  11. def field_based_id(prefix: String, digestImpl: String, cols: Column*): Column

    Creates an id from fields using MessageDigests

    Creates an id from fields using MessageDigests

    Definition Classes
    HashRelatedFunctionImports
  12. def flatten_folder_results(result: Column): Column

    Flattens folder results, unpacking the nested structure into a simple relation

    Flattens folder results, unpacking the nested structure into a simple relation

    Definition Classes
    RuleRunnerFunctionImports
  13. def flatten_results(result: Column): Column

    Flattens DQ results, unpacking the nested structure into a simple relation

    Flattens DQ results, unpacking the nested structure into a simple relation

    Definition Classes
    RuleRunnerFunctionImports
  14. def flatten_rule_results(result: Column): Column

    Flattens rule results, unpacking the nested structure into a simple relation

    Flattens rule results, unpacking the nested structure into a simple relation

    Definition Classes
    RuleRunnerFunctionImports
  15. def from_yaml(yaml: Column, dataType: DataType): Column

    Converts yaml expressions to spark native types

    Converts yaml expressions to spark native types

    dataType

    the yaml's data type

    Definition Classes
    YamlFunctionImports
  16. def group_results(runnerResults: Column, processResult: Option[(Column) => Column] = None): Column

    Groups runnerResults which represent the result column from an array of rule runners, expressionRunner excluded, or an array of RuleSuiteGroupResults.

    Groups runnerResults which represent the result column from an array of rule runners, expressionRunner excluded, or an array of RuleSuiteGroupResults. Note many of the results will contain nullable fields, see GroupResultsTest for example result types with Scala Encoders. The allowed input types are:

    - array(ruleRunner DQ results) which returns RuleSuiteGroupResults - array(ruleEngineResults) which returns either (RuleSuiteGroupResults, array((salientRule, result))) or (RuleSuiteGroupResults, array((salientRule, array((salience,result)))) for debug - array(ruleFolderResults) which returns either (RuleSuiteGroupResults, array(result)) or (RuleSuiteGroupResults, array(array((salience,result))) for debug - array(collectRunner results) which returns (RuleSuiteGroupResults, array(result)), you can also use processResult to flatten - array( group_results( array( group_results( collect_runner ... will return (RuleSuiteGroupResults, array(array(array(X))))

    runnerResults

    the runner result column

    processResult

    a lambda to process the result column, not applicable to DQ results, this removes the need for an additional select/projection to process results, ideal for calling flatten on results

    Definition Classes
    RuleResultImport
  17. def hashF(func: String, digestImpl: String, cols: Column*): Column
    Attributes
    protected
    Definition Classes
    HashRelatedFunctionImports
  18. def hash_field_based_id(prefix: String, digestImpl: String, cols: Column*): Column

    Creates an id from fields using Guava Hashers

    Creates an id from fields using Guava Hashers

    Definition Classes
    HashRelatedFunctionImports
  19. def hash_with(digestImpl: String, cols: Column*): Column

    Converts columns into a digest using Guava Hashers

    Converts columns into a digest using Guava Hashers

    returns

    array of long

    Definition Classes
    HashRelatedFunctionImports
  20. def hash_with_struct(digestImpl: String, cols: Column*): Column

    Converts columns into a digest using Guava Hashers

    Converts columns into a digest using Guava Hashers

    returns

    struct with fields i0, i1, i2 etc.

    Definition Classes
    HashRelatedFunctionImports
  21. def id_base64(idFields: Column*): Column

    Converts either a single ID or individual field parts into base64.

    Converts either a single ID or individual field parts into base64. The parts must be provided in the correct order, base, i0, i1.. iN

    Definition Classes
    GuaranteedUniqueIDImports
  22. def id_equal(aPrefix: String, bPrefix: String): Column

    Similar to long_pair_equal but against 160 bit ids.

  23. def id_from_base64(base64: Column, size: Int = 2): Column

    Given a base64 string convert to an ID, use id_size to understand how large IDs could be.

    Given a base64 string convert to an ID, use id_size to understand how large IDs could be.

    size

    defaults to 2 (160bit ID)

    Definition Classes
    GuaranteedUniqueIDImports
  24. def id_raw_type(id: Column): Column

    Returns the underlying raw type of an id (base, i0, i1 etc.) without prefixes

    Returns the underlying raw type of an id (base, i0, i1 etc.) without prefixes

    Definition Classes
    GuaranteedUniqueIDImports
  25. def id_size(id: Column): Column

    Returns the size of an underlying ID, a unique_id will have 2, other id's may have more, each further increment is another 64bits

    Returns the size of an underlying ID, a unique_id will have 2, other id's may have more, each further increment is another 64bits

    Definition Classes
    GuaranteedUniqueIDImports
  26. val ignored_rule: Column

    The ignored_rule value

    The ignored_rule value

    Definition Classes
    RuleRunnerFunctionImports
  27. def inc(incrementWith: Column): SumExpression

    Adds incrementWith to the sum value

    Adds incrementWith to the sum value

    Definition Classes
    AggregateFunctionImports
  28. val inc: SumExpression

    Adds 1L to the sum value

    Adds 1L to the sum value

    Definition Classes
    AggregateFunctionImports
  29. def long_pair(lower: Column, higher: Column): Column

    creates a (lower, higher) struct

    creates a (lower, higher) struct

    Definition Classes
    LongPairImports
  30. def long_pair_equal(aPrefix: String, bPrefix: String): Column

    Compares aPrefix_lower = bPrefix_lower and aPrefix_higher = bPrefix_higher

  31. def long_pair_from_uuid(uuid: Column): Column

    creates a (lower, higher) struct from a uuid's least and most significant bits

    creates a (lower, higher) struct from a uuid's least and most significant bits

    Definition Classes
    LongPairImports
  32. def map_contains(mapLookupName: String, lookupKey: Column, mapLookups: quality.MapLookups): Column

    Tests if there is a stored value from a map via the name mapLookupName and 'key' lookupKey.

    Tests if there is a stored value from a map via the name mapLookupName and 'key' lookupKey. Implementation is map_lookup.isNotNull

    Definition Classes
    MapLookupFunctionImports
  33. def map_lookup(mapLookupName: String, lookupKey: Column, mapLookups: quality.MapLookups): Column

    Retrieves the stored value from a map via the name mapLookupName and 'key' lookupKey

    Retrieves the stored value from a map via the name mapLookupName and 'key' lookupKey

    Definition Classes
    MapLookupFunctionImports
  34. def map_with(id: Column, sum: (Column) => Column): SumExpression

    Creates an entry in a map sum with id and the result of 'sum' with the previous sum at that id as it's parameter.

    Creates an entry in a map sum with id and the result of 'sum' with the previous sum at that id as it's parameter.

    sum

    the parameter is the previous value of maps' id entry

    Definition Classes
    AggregateFunctionImports
  35. val meanf: ResultsExpression

    Provides the mean (summed value / count of filtered rows)

    Provides the mean (summed value / count of filtered rows)

    Definition Classes
    AggregateFunctionImports
  36. def murmur3ID(prefix: String, child1: Column, restOfchildren: Column*): Column
    Definition Classes
    GenericLongBasedImports
  37. def murmur3ID(prefix: String, children: Seq[Column]): Column

    Murmur3 hash

    Murmur3 hash

    Definition Classes
    GenericLongBasedImports
  38. def pack_ints(id: Id): Column

    Packs two integers into a long, typically used for versioned ids.

    Packs two integers into a long, typically used for versioned ids.

    Definition Classes
    PackIdImports
  39. def pack_ints(id: Int, version: Int): Column

    Packs two integers into a long, typically used for versioned ids.

    Packs two integers into a long, typically used for versioned ids.

    Definition Classes
    PackIdImports
  40. def pack_ints(id: Column, version: Column): Column

    Packs two integers into a long, typically used for versioned ids.

    Packs two integers into a long, typically used for versioned ids.

    Definition Classes
    PackIdImports
  41. val passed: Column

    The passed value

    The passed value

    Definition Classes
    RuleRunnerFunctionImports
  42. def prefixed_to_long_pair(source: Column, prefix: String): Column

    Converts a prefixed long pair to lower, higher

    Converts a prefixed long pair to lower, higher

    Definition Classes
    LongPairImports
  43. def probability(result: Column): Column

    Returns the probability from a given rule result

    Returns the probability from a given rule result

    Definition Classes
    RuleRunnerFunctionImports
  44. def provided_id(prefix: String, child: Column): Column

    Creates a hash based ID based on an upstream compatible long generator

    Creates a hash based ID based on an upstream compatible long generator

    Definition Classes
    GenericLongBasedImports
  45. def results_with(result: (Column, Column) => Column): ResultsExpression

    Produces an aggregate result

    Produces an aggregate result

    result

    the sum and count are parameters

    Definition Classes
    AggregateFunctionImports
  46. val return_both: ResultsExpression

    returns both the count and sum

    returns both the count and sum

    Definition Classes
    AggregateFunctionImports
  47. val return_sum: ResultsExpression

    returns the sum, ignoring the count

    returns the sum, ignoring the count

    Definition Classes
    AggregateFunctionImports
  48. def reverse_comparable_maps(mapStruct: Column): Column

    Efficiently converts the mapStruct column to it's original Map type

    Efficiently converts the mapStruct column to it's original Map type

    Definition Classes
    ComparableMapsImports
  49. def rngID(prefix: String): Column

    Creates a default randomRNG based on RandomSource.XO_RO_SHI_RO_128_PP

    Creates a default randomRNG based on RandomSource.XO_RO_SHI_RO_128_PP

    Definition Classes
    GenericLongBasedImports
  50. def rng_bytes(randomSource: RandomSource = RandomSource.XO_RO_SHI_RO_128_PP, numBytes: Int = 16, seed: Long = 0): Column

    Creates a random number generator using a given commons-rng source

    Creates a random number generator using a given commons-rng source

    randomSource

    commons-rng random source

    numBytes

    the number of bytes to produce in the array, defaulting to 16

    seed

    the seed to use / mixin

    returns

    a column with the appropriate rng defined

    Definition Classes
    RngFunctionImports
  51. def rng_id(prefix: String, randomSource: RandomSource, seed: Long = 0L): Column

    Creates a randomRNG ID based on randomSource with a given seed

    Creates a randomRNG ID based on randomSource with a given seed

    Definition Classes
    GenericLongBasedImports
  52. def rng_uuid(column: Column): Column

    Creates a uuid from byte arrays or two longs, use with the rng() function to generate random uuids.

    Creates a uuid from byte arrays or two longs, use with the rng() function to generate random uuids.

    Definition Classes
    RngFunctionImports
  53. def rule_result(ruleSuiteResults: Column, ruleSuiteId: Int, ruleSuiteVersion: Int, ruleSetId: Int, ruleSetVersion: Int, ruleId: Int, ruleVersion: Int): Column

    Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type.

    Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type. Integer is returned for DQ checks and either String or (ruleResult: String, resultDDL: String) for ExpressionResults.

    Definition Classes
    RuleResultImport
  54. def rule_result(ruleSuiteResults: Column, ruleSuiteId: Column, ruleSuiteVersion: Column, ruleSetId: Column, ruleSetVersion: Column, ruleId: Column, ruleVersion: Column): Column

    Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type.

    Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type. Integer is returned for DQ checks and either String or (ruleResult: String, resultDDL: String) for ExpressionResults.

    Definition Classes
    RuleResultImport
  55. def rule_result(ruleSuiteResults: Column, ruleSuiteId: Column, ruleSetId: Column, ruleId: Column): Column

    Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type.

    Retrieves the rule result for a given id, the result type is dependent on ruleSuiteResults's type. Integer is returned for DQ checks and either String or (ruleResult: String, resultDDL: String) for ExpressionResults.

    Definition Classes
    RuleResultImport
  56. def rule_suite_result_details(result: Column): Column

    Consumes a RuleSuiteResult and returns RuleSuiteDetails

    Consumes a RuleSuiteResult and returns RuleSuiteDetails

    Definition Classes
    RuleRunnerFunctionImports
  57. def rule_suite_statistics(results: Column): Column

    Aggregates over RuleSuiteResults columns and returns a RuleSuiteGroupStatistics row

    Aggregates over RuleSuiteResults columns and returns a RuleSuiteGroupStatistics row

    Definition Classes
    AggregateFunctionImports
  58. val soft_failed: Column

    The soft_failed value

    The soft_failed value

    Definition Classes
    RuleRunnerFunctionImports
  59. def strip_result_ddl(expressionResults: Column): Column

    Stores only the ruleResult, removing the structure including the resultDDL column

    Stores only the ruleResult, removing the structure including the resultDDL column

    Definition Classes
    StripResultTypesFunction
  60. def sum_with(sum: (Column) => Column): SumExpression

    Given the current sum, produce the next sum, for example by incrementing 1 on the sum to count filtered rows

    Given the current sum, produce the next sum, for example by incrementing 1 on the sum to count filtered rows

    Definition Classes
    AggregateFunctionImports
  61. def to_yaml(col: Column, renderOptions: Map[String, String] = Map.empty): Column

    Converts spark expressions to yaml using snakeyml.

    Converts spark expressions to yaml using snakeyml.

    Definition Classes
    YamlFunctionImports
  62. def unify_result(runnerResults: Column): Column

    Converts non-debug engine results into a single result column, allowing engine, collector and folder results to be nested and combined via group_results within the same nested rule suite calls.

    Converts non-debug engine results into a single result column, allowing engine, collector and folder results to be nested and combined via group_results within the same nested rule suite calls. Similarly, non-debug RuleSuiteGroupResults can be converted.

    Definition Classes
    RuleResultImport
  63. def unique_id(prefix: String): Column

    Creates a uniqueID backed by the GuaranteedUniqueID Spark Snowflake ID approach

    Creates a uniqueID backed by the GuaranteedUniqueID Spark Snowflake ID approach

    Definition Classes
    GuaranteedUniqueIDImports
  64. def unpack(packedId: Column): Column

    Takes a packedId long and unpacks to id, version

    Takes a packedId long and unpacks to id, version

    Definition Classes
    PackIdImports
  65. def unpack_id_triple(idTriple: Column): Column

    Unpacks an IdTriple column into it's six constituent integers

    Unpacks an IdTriple column into it's six constituent integers

    Definition Classes
    PackIdImports
  66. def update_field(update: Column, transformations: (String, Column)*): Column

    Adds fields, in order, for each field path it's paired transformation is applied to the update column

    Adds fields, in order, for each field path it's paired transformation is applied to the update column

    returns

    a new copy of update with the changes applied

    Definition Classes
    StructFunctionsImport
  67. def za_field_based_id(prefix: String, digestImpl: String, cols: Column*): Column

    Creates an id from fields using ZeroAllocation LongHashFactory (64bit)

    Creates an id from fields using ZeroAllocation LongHashFactory (64bit)

    Definition Classes
    HashRelatedFunctionImports
  68. def za_hash_longs_with(digestImpl: String, cols: Column*): Column

    Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)

    Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)

    returns

    array of long

    Definition Classes
    HashRelatedFunctionImports
  69. def za_hash_longs_with_struct(digestImpl: String, cols: Column*): Column

    Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)

    Converts columns into a digest via ZeroAllocation LongTuple Factory (128-bit)

    returns

    struct with fields i0, i1, i2 etc.

    Definition Classes
    HashRelatedFunctionImports
  70. def za_hash_with(digestImpl: String, cols: Column*): Column

    Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)

    Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)

    returns

    array of long

    Definition Classes
    HashRelatedFunctionImports
  71. def za_hash_with_struct(digestImpl: String, cols: Column*): Column

    Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)

    Converts columns into a digest via ZeroAllocation LongHashFactory (64bit)

    returns

    struct with fields i0, i1, i2 etc.

    Definition Classes
    HashRelatedFunctionImports
  72. def za_longs_field_based_id(prefix: String, digestImpl: String, cols: Column*): Column

    Creates an id from fields using ZeroAllocation LongTuple Factory (128-bit)

    Creates an id from fields using ZeroAllocation LongTuple Factory (128-bit)

    Definition Classes
    HashRelatedFunctionImports

Deprecated Value Members

  1. def fieldBasedID(prefix: String, digestImpl: String, children: Column*): Column

    Creates an id from fields using MessageDigests, in line with SQL naming please use field_based_id

    Creates an id from fields using MessageDigests, in line with SQL naming please use field_based_id

    Definition Classes
    HashRelatedFunctionImports
    Annotations
    @deprecated
    Deprecated

    (Since version 0.1.0) migrate to field_based_id

  2. def providedID(prefix: String, child: Column): Column

    Creates a hash based ID based on an upstream compatible long generator, in line with sql functions please migrate to provided_id

    Creates a hash based ID based on an upstream compatible long generator, in line with sql functions please migrate to provided_id

    Definition Classes
    GenericLongBasedImports
    Annotations
    @deprecated
    Deprecated

    (Since version 0.1.0) migrate to provided_id, providedID will be removed in 0.3.0

  3. def rule_suite_statistics_aggregator(results: Column): Column

    Prefer rule_suite_statistics, this function is only provided as a fallback should there be issues in the rule_suite_statistics implementation, which is over 6% faster.

    Prefer rule_suite_statistics, this function is only provided as a fallback should there be issues in the rule_suite_statistics implementation, which is over 6% faster.

    Aggregates over RuleSuiteResults columns and returns a RuleSuiteGroupStatistics row using a pre Spark 4 unified API Aggregator, per https://github.com/sparkutils/quality/issues/117 this does not run on Databricks Shared Clusters. *

    Definition Classes
    AggregateFunctionImports
    Annotations
    @deprecated
    Deprecated

    (Since version 0.2.0) This aggregation implementation will be removed in 0.3.0 and should only be used if rule_suite_statistics has issues

Inherited from YamlFunctionImports

Inherited from StructFunctionsImport

Inherited from LongPairImports

Inherited from RngFunctionImports

Inherited from PackIdImports

Inherited from RuleResultImport

Inherited from ComparableMapsImports

Inherited from AnyRef

Inherited from Any

Ungrouped