Packages

package quality

Provides an easy import point for the library.

Content Hierarchy
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. quality
  2. CollectRunnerImports
  3. VariableProcessIfMissing
  4. VersionSpecificSerializingImports
  5. ExpressionRunnerImports
  6. ViewLoading
  7. RuleFolderRunnerImports
  8. RuleEngineRunnerImports
  9. LambdaFunctionsImports
  10. AddDataFunctionsImports
  11. SerializingImports
  12. MapLookupImportsShared
  13. Serializable
  14. RuleRunnerImports
  15. AnyRef
  16. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Package Members

  1. package classicFunctions

    Collection of the Quality Spark Expressions for use in select( Column * )

  2. package functions

    Collection of the Quality Spark Expressions for use in select( Column * )

  3. package impl
  4. package implicits

    Imports the common implicits

  5. package simpleVersioning

    A simple versioning scheme that allows management of versions

  6. package sparkless

Type Members

  1. trait BloomLookup extends AnyRef

    Simple does it contain function to test a bloom

  2. case class BloomModel(rootDir: String, fpp: Double, numBuckets: Int) extends Serializable with Product

    Represents the shared file location of a bucked bloom filter.

    Represents the shared file location of a bucked bloom filter. There should be files with names 0..numBuckets containing the same number of bytes representing each bucket.

    rootDir

    The directory which contains each bucket

    fpp

    The fpp for this bloom - note it is informational only and will not be used in further processing

    numBuckets

    The number of buckets within this bloom

  3. trait ClassicOnly extends Annotation
    Annotations
    @Retention()
  4. case class CombinedRuleRow(ruleRow: RuleRow, outputExpressionRow: Option[OutputExpressionRow]) extends Product with Serializable

    Raw model for Variable usage

  5. case class CombinedRuleSuiteRows(ruleSuiteId: Int, ruleSuiteVersion: Int, ruleRows: Seq[CombinedRuleRow], lambdaFunctions: Option[Seq[LambdaFunctionRow]], probablePass: Option[Double], defaultProcessor: Option[OutputExpressionRow]) extends Product with Serializable

    Raw model for Variable usage

  6. trait DataFrameLoader extends AnyRef

    Simple interface to load DataFrames used by map/bloom and view loading

  7. trait DefaultProcessor extends HasRuleText with HasOutputExpression

    Configuration of what should be evaluated when no trigger passes for Folder and Collector

  8. trait ExpressionRule extends Serializable

    A trigger rule

  9. case class GeneralExpressionResult(result: String, resultDDL: String) extends Product with Serializable

    Represents the expression results of ExpressionRunner

    Represents the expression results of ExpressionRunner

    result

    the result casted to string

    resultDDL

    the result type in ddl

    Annotations
    @SerialVersionUID()
  10. case class GeneralExpressionsResult[R](id: VersionedId, ruleSetResults: Map[VersionedId, Map[VersionedId, R]]) extends Serializable with Product

    Represents the results of the ExpressionRunner

    Represents the results of the ExpressionRunner

    Annotations
    @SerialVersionUID()
  11. case class GeneralExpressionsResultNoDDL(id: VersionedId, ruleSetResults: Map[VersionedId, Map[VersionedId, String]]) extends Serializable with Product

    Represents the results of the ExpressionRunner after calling strip_result_ddl

    Represents the results of the ExpressionRunner after calling strip_result_ddl

    Annotations
    @SerialVersionUID()
  12. trait HasOutputExpression extends Serializable
  13. trait HasRuleText extends Serializable
  14. case class Id(id: Int, version: Int) extends VersionedId with Product with Serializable

    A versioned rule ID - note the name is never persisted in results, the id and version are sufficient to retrieve the name

    A versioned rule ID - note the name is never persisted in results, the id and version are sufficient to retrieve the name

    id

    a unique ID to identify this rule

    version

    the version of the rule - again tied to ID

    Annotations
    @SerialVersionUID()
  15. type IdTriple = (Id, Id, Id)
    Definition Classes
    LambdaFunctionsImports
  16. trait LambdaFunction extends HasRuleText

    A user defined SQL function

  17. case class LambdaFunctionRow(name: String, ruleExpr: String, functionId: Int, functionVersion: Int, ruleSuiteId: Int, ruleSuiteVersion: Int) extends Product with Serializable
  18. case class LazyRuleEngineResult[T](lazyRuleSuiteResults: LazyRuleSuiteResult, salientRule: Option[SalientRule], result: Option[T]) extends Serializable with Product

    Results for all rules run against a DataFrame, the RuleSuiteResult is lazily evaluated.

    Results for all rules run against a DataFrame, the RuleSuiteResult is lazily evaluated. Note in debug mode the type of T must be Seq[(Int, ActualType)]

    lazyRuleSuiteResults

    Overall results from applying the engine, evealuated lazily

    salientRule

    if it's None there is no rule which matched for this row or it's in Debug mode which will return all results.

    result

    The result type for this row, if no rule matched this will be None, if a rule matched but the outputexpression returned null this will also be None

    Annotations
    @SerialVersionUID()
  19. case class LazyRuleFolderResult[T](lazyRuleSuiteResults: LazyRuleSuiteResult, result: Option[T]) extends Serializable with Product

    Results for all rules run against a DataFrame, the RuleSuiteResult is lazily evaluated.

    Results for all rules run against a DataFrame, the RuleSuiteResult is lazily evaluated. Note in debug mode the type of T must be Seq[(Int, ActualType)]

    lazyRuleSuiteResults

    Overall results from applying the engine, evaluated lazily

    result

    The result type for this row, if no rule matched this will be None, if a rule matched but the outputexpression returned null this will also be None

    Annotations
    @SerialVersionUID()
  20. trait LazyRuleSuiteResult extends Serializable

    A lazy proxy for RuleSuiteResult

  21. trait LazyRuleSuiteResultDetails extends Serializable

    A lazy proxy for RuleSuiteResultDetails

    A lazy proxy for RuleSuiteResultDetails

    Annotations
    @SerialVersionUID()
  22. type MapCreator = () => (DataFrame, Column, Column)
    Definition Classes
    MapLookupImportsShared
  23. type MapLookups = String

    Used as a param to load the map lookups - note the type of the broadcast is always Map[AnyRef, AnyRef]

    Used as a param to load the map lookups - note the type of the broadcast is always Map[AnyRef, AnyRef]

    Definition Classes
    MapLookupImportsShared
  24. case class MetaRuleSetRow(ruleSuiteId: Int, ruleSuiteVersion: Int, ruleSetId: Int, ruleSetVersion: Int, columnFilter: String, ruleExpr: String) extends Product with Serializable

    only one arg is supported without brackets etc.

    only one arg is supported without brackets etc.

    Law: Each Rule generated must have a stable Id for the same column, the version used is the same as the RuleSet

    The caller of withNameAndOrd must enforce this law to have stable and correctly evolving rules.

  25. trait OutputExpression extends AnyRef

    Used as a result of serializing

  26. case class OutputExpressionRow(ruleExpr: String, functionId: Int, functionVersion: Int, ruleSuiteId: Int, ruleSuiteVersion: Int) extends Product with Serializable
  27. case class Probability(percentage: Double) extends RuleResult with Product with Serializable

    0-1 with 1 being absolutely likely a pass

    0-1 with 1 being absolutely likely a pass

    Annotations
    @SerialVersionUID()
  28. case class QualityException(msg: String, cause: Throwable = null) extends RuntimeException with Product with Serializable

    Simple marker instead of sys.error

    Simple marker instead of sys.error

    Annotations
    @SerialVersionUID()
  29. trait ResultStatistics[T <: ResultStatistics[_]] extends Serializable
  30. sealed trait ResultStatisticsProvider[T, R] extends Serializable
  31. sealed trait ResultStatisticsProviderImpl[T <: ResultStatistics[_]] extends Serializable
  32. case class Rule(id: Id, expression: ExpressionRule, runOnPassProcessor: RunOnPassProcessor = NoOpRunOnPassProcessor.noOp) extends Serializable with Product

    A rule to run over a row

    A rule to run over a row

    Annotations
    @SerialVersionUID()
  33. case class RuleEngineResult[T](ruleSuiteResults: RuleSuiteResult, salientRule: Option[SalientRule], result: Option[T]) extends Serializable with Product

    Results for all rules run against a DataFrame.

    Results for all rules run against a DataFrame. Note in debug mode the type of T must be Seq[(Int, ActualType)]

    ruleSuiteResults

    Overall results from applying the engine

    salientRule

    if it's None there is no rule which matched for this row or it's in Debug mode which will return all results.

    result

    The result type for this row, if no rule matched this will be None, if a rule matched but the outputexpression returned null this will also be None

    Annotations
    @SerialVersionUID()
  34. case class RuleFolderResult[T](ruleSuiteResults: RuleSuiteResult, result: Option[T]) extends Serializable with Product

    Results for all rules run against a DataFrame.

    Results for all rules run against a DataFrame. Note in debug mode the type of T must be Seq[(Int, ActualType)]

    ruleSuiteResults

    Overall results from applying the engine

    result

    The result type for this row, if no rule matched this will be None, if a rule matched but the outputexpression returned null this will also be None

    Annotations
    @SerialVersionUID()
  35. sealed trait RuleResult extends Serializable
    Annotations
    @SerialVersionUID()
  36. case class RuleResultRow(ruleSuiteId: Int, ruleSuiteVersion: Int, ruleSuiteResult: Int, ruleSetResult: Int, ruleSetId: Int, ruleSetVersion: Int, ruleId: Int, ruleVersion: Int, ruleResult: Int) extends Product with Serializable

    Flattened results for aggregation / display / use by the explodeResults expression

  37. case class RuleResultWithProcessor(ruleResult: RuleResult, runOnPassProcessor: RunOnPassProcessor) extends RuleResult with Product with Serializable

    Packs a rule result with a RunOnPassProcessor processor

    Packs a rule result with a RunOnPassProcessor processor

    Annotations
    @SerialVersionUID()
  38. case class RuleRow(ruleSuiteId: Int, ruleSuiteVersion: Int, ruleSetId: Int, ruleSetVersion: Int, ruleId: Int, ruleVersion: Int, ruleExpr: String, ruleEngineSalience: Int = Int.MaxValue, ruleEngineId: Int = 0, ruleEngineVersion: Int = 0) extends Product with Serializable
  39. case class RuleSet(id: Id, rules: Seq[Rule]) extends Serializable with Product
    Annotations
    @SerialVersionUID()
  40. case class RuleSetResult(overallResult: RuleResult, ruleResults: Map[VersionedId, RuleResult]) extends Serializable with Product

    Result collection for a number of rules

    Result collection for a number of rules

    ruleResults

    rule id -> ruleresult

    Annotations
    @SerialVersionUID()
  41. case class RuleSetStatistics(ruleSet: VersionedId, failed: Long = 0, passed: Long = 0, softFailed: Long = 0, disabled: Long = 0, ignored: Long = 0, defaulted: Long = 0, probabilityPassed: Long = 0, probabilityFailed: Long = 0, rules: Map[VersionedId, RuleStatistics] = Map.empty) extends ResultStatistics[RuleSetStatistics] with Product with Serializable

    Aggregated RuleStatistics for a RuleSet

    Aggregated RuleStatistics for a RuleSet

    Annotations
    @SerialVersionUID()
  42. case class RuleStatistics(rule: VersionedId, failed: Long = 0, passed: Long = 0, softFailed: Long = 0, disabled: Long = 0, ignored: Long = 0, defaulted: Long = 0, probabilityPassed: Long = 0, probabilityFailed: Long = 0) extends ResultStatistics[RuleStatistics] with Product with Serializable

    Rule level results aggregated across a set of rows

    Rule level results aggregated across a set of rows

    Annotations
    @SerialVersionUID()
  43. case class RuleSuite(id: Id, ruleSets: Seq[RuleSet], lambdaFunctions: Seq[LambdaFunction] = Seq.empty, probablePass: Double = defaultProbablePass, defaultProcessor: DefaultProcessor = NoOpDefaultProcessor.noOp) extends Serializable with Product

    Represents a versioned collection of RuleSet's

    Represents a versioned collection of RuleSet's

    probablePass

    override to specify a different percentage for treating probability results as passes - defaults to 80% (0.8)

    defaultProcessor

    when using folder or collector this output expression will be used when no rules are triggered

    Annotations
    @SerialVersionUID()
  44. case class RuleSuiteGroup(ruleSuites: Map[VersionedId, RuleSuite] = Map.empty) extends Product with Serializable

    Represents a grouping of RuleSuites

    Represents a grouping of RuleSuites

    Annotations
    @SerialVersionUID()
  45. case class RuleSuiteGroupResults(ruleSuiteResults: Map[VersionedId, RuleSuiteResult] = Map.empty) extends Product with Serializable

    Represents a grouping of RuleSuiteResults, from a call to group_results (Spark 4 only).

    Represents a grouping of RuleSuiteResults, from a call to group_results (Spark 4 only). Given the results may be combined for audit trail reasons from both engines and dq results no overallResult equivalent is provided. Please use overall_dq and overall_engine functions to process the ruleSuiteResults, filtering the RuleSuite Ids as needed.

    Annotations
    @SerialVersionUID()
  46. case class RuleSuiteGroupStatistics(ruleSuites: Map[VersionedId, RuleSuiteStatistics] = Map.empty, rowCount: Long = 0) extends Serializable with Product

    Convenience as a dataset may contain more than one rule suite

    Convenience as a dataset may contain more than one rule suite

    Annotations
    @SerialVersionUID()
  47. case class RuleSuiteResult(id: VersionedId, overallResult: RuleResult, ruleSetResults: Map[VersionedId, RuleSetResult]) extends Serializable with Product

    Results for all rules run against a dataframe

    Results for all rules run against a dataframe

    id

    - the Id of the suite, all other content is mapped

    Annotations
    @SerialVersionUID()
  48. case class RuleSuiteResultDetails(id: VersionedId, ruleSetResults: Map[VersionedId, RuleSetResult]) extends Serializable with Product

    Results for all rules run against a dataframe without the overallResult.

    Results for all rules run against a dataframe without the overallResult. Performance differences for filtering on top level fields are significant over nested structures even under Spark 3, in the region of 30-50% depending on op.

    Annotations
    @SerialVersionUID()
  49. case class RuleSuiteRow(ruleSuiteId: Int, ruleSuiteVersion: Int, probablePass: Double, ruleEngineId: Int, ruleEngineVersion: Int) extends Product with Serializable

    Only needed for probability and defaultProcessor support with the optional integrateRuleSuites

  50. case class RuleSuiteStatistics(ruleSuite: VersionedId, failed: Long = 0, passed: Long = 0, softFailed: Long = 0, disabled: Long = 0, ignored: Long = 0, defaulted: Long = 0, probabilityPassed: Long = 0, probabilityFailed: Long = 0, rowCount: Long = 0, ruleSets: Map[VersionedId, RuleSetStatistics] = Map.empty) extends ResultStatistics[RuleSuiteStatistics] with Product with Serializable

    Aggregated RuleSetStatistics for a complete RuleSuite

    Aggregated RuleSetStatistics for a complete RuleSuite

    Annotations
    @SerialVersionUID()
  51. trait RunOnPassProcessor extends HasRuleText with HasOutputExpression

    Configuration of what should be evaluated when a trigger passes and salience matches

  52. case class SalientRule(ruleSuiteId: VersionedId, ruleSetId: VersionedId, ruleId: VersionedId) extends Product with Serializable

    Represents the rule that matched a given RuleEngine result

    Represents the rule that matched a given RuleEngine result

    Annotations
    @SerialVersionUID()
  53. case class SimpleField(name: String, dataType: String, nullable: Boolean) extends Product with Serializable

    Used to filter columns for meta RuleSets

  54. case class SparkBloomFilter(bloom: BloomFilter) extends BloomLookup with Product with Serializable
  55. sealed trait VersionedId extends Serializable

    base for storage of rule or ruleset ids, must be a trait to force frameless to use lookup and stop any accidental auto product treatment

Abstract Value Members

  1. abstract def getClass(): Class[_ <: AnyRef]
    Definition Classes
    Any

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    Any
  2. final def ##: Int
    Definition Classes
    Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    Any
  4. val DefaultRuleInt: Int

    The integer value for RuleSuiteResult.overallResult when no trigger rules have run and the default rule was

    The integer value for RuleSuiteResult.overallResult when no trigger rules have run and the default rule was

    Definition Classes
    RuleRunnerImports
  5. val DefaultRuleSalience: Int

    When Folder is configured with a default rule and debug mode is enabled, this salience is returned

    When Folder is configured with a default rule and debug mode is enabled, this salience is returned

    Definition Classes
    RuleRunnerImports
  6. val DisabledRuleInt: Int

    The integer value for disabled dq rules

    The integer value for disabled dq rules

    Definition Classes
    RuleRunnerImports
  7. val FailedInt: Int

    The integer value for failed dq rules

    The integer value for failed dq rules

    Definition Classes
    RuleRunnerImports
  8. val IgnoredRuleInt: Int

    The integer value for ignored dq rules

    The integer value for ignored dq rules

    Definition Classes
    RuleRunnerImports
  9. val PassedInt: Int

    The integer value for passed dq rules

    The integer value for passed dq rules

    Definition Classes
    RuleRunnerImports
  10. val SoftFailedInt: Int

    The integer value for soft failed dq rules

    The integer value for soft failed dq rules

    Definition Classes
    RuleRunnerImports
  11. def addDataQuality(dataFrame: DataFrame, rules: RuleSuite, name: String = "DataQuality"): DataFrame

    Adds a DataQuality field using the RuleSuite and RuleSuiteResult structure

    Adds a DataQuality field using the RuleSuite and RuleSuiteResult structure

    Definition Classes
    AddDataFunctionsImports
  12. def addDataQualityF(rules: RuleSuite, name: String = "DataQuality"): (DataFrame) => DataFrame

    Adds a DataQuality field using the RuleSuite and RuleSuiteResult structure for use with dataset.transform functions

    Adds a DataQuality field using the RuleSuite and RuleSuiteResult structure for use with dataset.transform functions

    Definition Classes
    AddDataFunctionsImports
  13. def addExpressionRunner(dataFrame: DataFrame, ruleSuite: RuleSuite, name: String = "expressionResults", renderOptions: Map[String, String] = Map.empty, ddlType: String = "", stripDDL: Boolean = false, postProcess: (DataFrame) => DataFrame = identity): DataFrame

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: yaml, resultType: String) Supplying a ddlType triggers the output type for the expression to be that ddl type, rather than using yaml conversion.

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: yaml, resultType: String) Supplying a ddlType triggers the output type for the expression to be that ddl type, rather than using yaml conversion.

    name

    the default column name "expressionResults"

    renderOptions

    provides rendering options to the underlying snake yaml implementation

    ddlType

    optional DDL string, when present yaml output is disabled and the output expressions must all have the same type

    stripDDL

    the default of false leaves the DDL present, true strips ddl from the output using strip_result_ddl

    Definition Classes
    AddDataFunctionsImports
  14. def addExpressionRunnerF[P[R] >: Dataset[R]](ruleSuite: RuleSuite, name: String = "expressionResults", renderOptions: Map[String, String] = Map.empty, ddlType: String = "", stripDDL: Boolean = false, postProcess: (DataFrame) => DataFrame = identity): (P[Row]) => P[Row]

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: yaml, resultType: String) Supplying a ddlType triggers the output type for the expression to be that ddl type, rather than using yaml conversion.

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: yaml, resultType: String) Supplying a ddlType triggers the output type for the expression to be that ddl type, rather than using yaml conversion.

    name

    the default column name "expressionResults"

    renderOptions

    provides rendering options to the underlying snake yaml implementation

    ddlType

    optional DDL string, when present yaml output is disabled and the output expressions must all have the same type

    stripDDL

    the default of false leaves the DDL present, true strips ddl from the output using strip_result_ddl

    Definition Classes
    AddDataFunctionsImports
  15. def addOverallResultsAndDetails(dataFrame: DataFrame, rules: RuleSuite, overallResult: String = "DQ_overallResult", resultDetails: String = "DQ_Details"): DataFrame

    Adds two columns, one for overallResult and the other the details, allowing 30-50% performance gains for simple filters

    Adds two columns, one for overallResult and the other the details, allowing 30-50% performance gains for simple filters

    Definition Classes
    AddDataFunctionsImports
  16. def addOverallResultsAndDetailsF(rules: RuleSuite, overallResult: String = "DQ_overallResult", resultDetails: String = "DQ_Details"): (DataFrame) => DataFrame

    Adds two columns, one for overallResult and the other the details, allowing 30-50% performance gains for simple filters, for use in dataset.transform functions

    Adds two columns, one for overallResult and the other the details, allowing 30-50% performance gains for simple filters, for use in dataset.transform functions

    Definition Classes
    AddDataFunctionsImports
  17. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  18. def collectRunner(ruleSuite: RuleSuite, resultDataType: Option[DataType] = None, variablesPerFunc: Int = 40, variableFuncGroup: Int = 20, flatten: Boolean = true, includeNulls: Boolean = false, useInPlaceArray: Boolean = true, unrollInPlaceArray: Boolean = false, unrollOutputArraySize: Int = 1): Column

    Creates a column that runs the folding RuleSuite.

    Creates a column that runs the folding RuleSuite. This also forces registering the lambda functions used by that RuleSuite.

    FolderRunner runs all output expressions for matching rules in order of salience, the startingStruct is passed ot the first matching, the result passed to the second etc. In contrast to ruleEngineRunner OutputExpressions should be lambdas with one parameter, that of the structure

    ruleSuite

    The ruleSuite with runOnPassProcessors

    resultDataType

    the collected result type, specify this if the derived types have nullability or ordering issues. By default, it takes the type of the last output expression

    variablesPerFunc

    Defaulting to 40 allows, in combination with variableFuncGroup allows customisation of handling the 64k jvm method size limitation when performing WholeStageCodeGen

    variableFuncGroup

    Defaulting to 20

    flatten

    when resultType is an ArrayType should the result be flattened

    includeNulls

    should nulls returned by the output expressions be included, note when flattening nulls IN the returned arrays are not filtered

    useInPlaceArray

    defaulting to true, this replaces the array handling approach for 'array' Output Expressions

    unrollInPlaceArray

    defaulting to false, when true, unrolls array copying for InPlaceArray usage, it _could_ be faster in some circumstances but in most cases trust the JVM's JIT

    unrollOutputArraySize

    defaulting to 1 and only used when unrollInPlaceArray is true. When higher, decides how many operations should be unrolled for array copies per loop, setting to a higher number than all OutputExpression array sizes would force all array copying to be inlined. It _could_ be faster in some circumstances but in most cases trust the JVM's JIT to unroll the loops.

    returns

    A Column representing the QualityRules expression built from this ruleSuite

    Definition Classes
    CollectRunnerImports
  19. def combine(ruleRows: Dataset[RuleRow], lambdaFunctionRows: Dataset[LambdaFunctionRow], outputExpressionRows: Dataset[OutputExpressionRow], globalLambdaSuites: Dataset[Id], ruleSuites: Dataset[RuleSuiteRow])(implicit encoder: Encoder[CombinedRuleSuiteRows]): Dataset[CombinedRuleSuiteRows]

    Combines ruleRows, lambdaFunctionRows and outputExpressionRows into CombinedRuleSuiteRows

    Combines ruleRows, lambdaFunctionRows and outputExpressionRows into CombinedRuleSuiteRows

    Definition Classes
    VersionSpecificSerializingImports
  20. def combine(ruleRows: Dataset[RuleRow], lambdaFunctionRows: Dataset[LambdaFunctionRow], outputExpressionRows: Dataset[OutputExpressionRow], globalLambdaSuites: Dataset[Id], globalOutputExpressionSuites: Dataset[Id], ruleSuites: Dataset[RuleSuiteRow])(implicit encoder: Encoder[CombinedRuleSuiteRows]): Dataset[CombinedRuleSuiteRows]

    Combines ruleRows, lambdaFunctionRows and outputExpressionRows into CombinedRuleSuiteRows

    Combines ruleRows, lambdaFunctionRows and outputExpressionRows into CombinedRuleSuiteRows

    Definition Classes
    VersionSpecificSerializingImports
  21. def combine(ruleRows: Dataset[RuleRow], lambdaFunctionRows: Dataset[LambdaFunctionRow], outputExpressionRows: Dataset[OutputExpressionRow], ruleSuites: Dataset[RuleSuiteRow])(implicit encoder: Encoder[CombinedRuleSuiteRows]): Dataset[CombinedRuleSuiteRows]

    Combines ruleRows, lambdaFunctionRows and outputExpressionRows into CombinedRuleSuiteRows, using probablePass

    Combines ruleRows, lambdaFunctionRows and outputExpressionRows into CombinedRuleSuiteRows, using probablePass

    Definition Classes
    VersionSpecificSerializingImports
  22. def combine(ruleRows: Dataset[RuleRow])(implicit encoder: Encoder[CombinedRuleSuiteRows]): Dataset[CombinedRuleSuiteRows]

    Uses ruleRows to make CombinedRuleSuiteRows

    Uses ruleRows to make CombinedRuleSuiteRows

    Definition Classes
    VersionSpecificSerializingImports
  23. def combine(ruleRows: Dataset[RuleRow], lambdaFunctionRows: Dataset[LambdaFunctionRow])(implicit encoder: Encoder[CombinedRuleSuiteRows]): Dataset[CombinedRuleSuiteRows]

    Combines ruleRows and lambdaFunctionRows into CombinedRuleSuiteRows

    Combines ruleRows and lambdaFunctionRows into CombinedRuleSuiteRows

    Definition Classes
    VersionSpecificSerializingImports
  24. def combine(ruleRows: Dataset[RuleRow], lambdaFunctionRows: Dataset[LambdaFunctionRow], outputExpressionRows: Dataset[OutputExpressionRow])(implicit encoder: Encoder[CombinedRuleSuiteRows]): Dataset[CombinedRuleSuiteRows]

    Combines ruleRows, lambdaFunctionRows and outputExpressionRows into CombinedRuleSuiteRows

    Combines ruleRows, lambdaFunctionRows and outputExpressionRows into CombinedRuleSuiteRows

    Definition Classes
    VersionSpecificSerializingImports
  25. def combine(ruleRows: Dataset[RuleRow], lambdaFunctionRows: Option[Dataset[LambdaFunctionRow]] = None, outputExpressionRows: Option[Dataset[OutputExpressionRow]] = None, globalLambdaSuites: Option[Dataset[Id]] = None, globalOutputExpressionSuites: Option[Dataset[Id]] = None, ruleSuites: Option[Dataset[RuleSuiteRow]] = None)(implicit encoder: Encoder[CombinedRuleSuiteRows]): Dataset[CombinedRuleSuiteRows]

    Combines ruleRows, lambdaFunctionRows and outputExpressionRows into CombinedRuleSuiteRows

    Combines ruleRows, lambdaFunctionRows and outputExpressionRows into CombinedRuleSuiteRows

    Definition Classes
    VersionSpecificSerializingImports
  26. def combined_rows(ruleSuite: RuleSuite): Dataset[CombinedRuleSuiteRows]
  27. def dqRuleRunner(ruleSuite: RuleSuite, variablesPerFunc: Int = 40, variableFuncGroup: Int = 20): Column

    Creates a column that runs the RuleSuite suitable for DQ / Validation.

    Creates a column that runs the RuleSuite suitable for DQ / Validation. This also forces registering the lambda functions used by that RuleSuite.

    ruleSuite

    The Qualty RuleSuite to evaluate

    variablesPerFunc

    Defaulting to 40, it allows, in combination with variableFuncGroup customisation of handling the 64k jvm method size limitation when performing WholeStageCodeGen. You _shouldn't_ need it but it's there just in case.

    variableFuncGroup

    Defaulting to 20

    returns

    A Column representing the Quality DQ expression built from this ruleSuite

    Definition Classes
    RuleRunnerImports
  28. def equals(arg0: Any): Boolean
    Definition Classes
    Any
  29. def expressionRunner(ruleSuite: RuleSuite, name: String = "expressionResults", renderOptions: Map[String, String] = Map.empty): Column
    Definition Classes
    ExpressionRunnerImports
  30. def foldAndReplaceFieldPairs[P[R] >: Dataset[R]](rules: RuleSuite, fields: Seq[(String, Column)], foldFieldName: String = "foldedFields", debugMode: Boolean = false, tempFoldDebugName: String = "tempFOLDDEBUG", maintainOrder: Boolean = true, alias: String = "main"): (P[Row]) => P[Row]

    Leverages the foldRunner to replace fields, the input fields pairs are used to create a structure that the rules fold over.

    Leverages the foldRunner to replace fields, the input fields pairs are used to create a structure that the rules fold over. Any field references in the providing pairs are then dropped from the original Dataframe and added back from the resulting structure.

    NOTE: The field order and types of the original DF will be maintained only when maintainOrder is true. As it requires access to the schema it may incur extra work.

    debugMode

    when true the last results are taken for the replaced fields

    maintainOrder

    when true the schema is used to replace fields in the correct location, when false they are simply appended

    Definition Classes
    AddDataFunctionsImports
  31. def foldAndReplaceFieldPairsWithStruct[P[R] >: Dataset[R]](rules: RuleSuite, fields: Seq[(String, Column)], struct: StructType, foldFieldName: String = "foldedFields", debugMode: Boolean = false, tempFoldDebugName: String = "tempFOLDDEBUG", maintainOrder: Boolean = true, alias: String = "main"): (P[Row]) => P[Row]

    Leverages the foldRunner to replace fields, the input field pairs are used to create a structure that the rules fold over, the type must match the provided struct.

    Leverages the foldRunner to replace fields, the input field pairs are used to create a structure that the rules fold over, the type must match the provided struct. Any field references in the providing pairs are then dropped from the original Dataframe and added back from the resulting structure.

    This version should only be used when you require select(*, foldRunner) to be used, it requires you fully specify types.

    NOTE: The field order and types of the original DF will be maintained only when maintainOrder is true. As it requires access to the schema it may incur extra work.

    struct

    The fields, and types, are used to call the foldRunner. These types must match in the input fields

    debugMode

    when true the last results are taken for the replaced fields

    maintainOrder

    when true the schema is used to replace fields in the correct location, when false they are simply appended

    Definition Classes
    AddDataFunctionsImports
  32. def foldAndReplaceFields[P[R] >: Dataset[R]](rules: RuleSuite, fields: Seq[String], foldFieldName: String = "foldedFields", debugMode: Boolean = false, tempFoldDebugName: String = "tempFOLDDEBUG", maintainOrder: Boolean = true, alias: String = "main"): (P[Row]) => P[Row]

    Leverages the foldRunner to replace fields, the input fields are used to create a structure that the rules fold over.

    Leverages the foldRunner to replace fields, the input fields are used to create a structure that the rules fold over. The fields are then dropped from the original Dataframe and added back from the resulting structure.

    NOTE: The field order and types of the original DF will be maintained only when maintainOrder is true. As it requires access to the schema it may incur extra work.

    debugMode

    when true the last results are taken for the replaced fields

    maintainOrder

    when true the schema is used to replace fields in the correct location, when false they are simply appended

    Definition Classes
    AddDataFunctionsImports
  33. def foldAndReplaceFieldsWithStruct[P[R] >: Dataset[R]](rules: RuleSuite, struct: StructType, foldFieldName: String = "foldedFields", debugMode: Boolean = false, tempFoldDebugName: String = "tempFOLDDEBUG", maintainOrder: Boolean = true, alias: String = "main"): (P[Row]) => P[Row]

    Leverages the foldRunner to replace fields, the input fields are used to create a structure that the rules fold over.

    Leverages the foldRunner to replace fields, the input fields are used to create a structure that the rules fold over. The fields are then dropped from the original Dataframe and added back from the resulting structure.

    This version should only be used when you require select(*, foldRunner) to be used, it requires you fully specify types.

    NOTE: The field order and types of the original DF will be maintained only when maintainOrder is true. As it requires access to the schema it may incur extra work.

    struct

    The fields, and types, are used to call the foldRunner. These types must match in the input fields

    debugMode

    when true the last results are taken for the replaced fields

    maintainOrder

    when true the schema is used to replace fields in the correct location, when false they are simply appended

    Definition Classes
    AddDataFunctionsImports
  34. def getConfig(name: String, default: String = ""): String

    first attempts to get the system env, then system java property then sqlconf

  35. def hashCode(): Int
    Definition Classes
    Any
  36. def integrateLambdas(ruleSuiteMap: RuleSuiteMap, lambdas: Map[Id, Seq[LambdaFunction]], globalLibrary: Option[Id] = None): RuleSuiteMap

    Add any of the Lambdas loaded for the rule suites

    Add any of the Lambdas loaded for the rule suites

    globalLibrary

    - all lambdas with this RuleSuite Id will be added to all RuleSuites

    Definition Classes
    SerializingImports
  37. def integrateMetaRuleSets(dataFrame: DataFrame, ruleSuiteMap: RuleSuiteMap, metaRuleSetMap: Map[Id, Seq[MetaRuleSetRow]], stablePosition: (String) => Int, transform: (DataFrame) => DataFrame = identity): RuleSuiteMap

    Integrates meta rulesets into the rulesuites.

    Integrates meta rulesets into the rulesuites. Note this only works for a specific dataset, if rulesuites should be filtered for a given dataset then this must take place before calling.

    dataFrame

    the dataframe to identify columns for metarules

    ruleSuiteMap

    the ruleSuites relevant for this dataframe

    metaRuleSetMap

    the meta rulesets

    stablePosition

    this function must maintain the law that each column name within a RuleSet always generates the same position

    Definition Classes
    SerializingImports
  38. def integrateOutputExpressions(ruleSuiteMap: RuleSuiteMap, outputs: Map[Id, Seq[OutputExpressionRow]], globalLibrary: Option[Id] = None): (RuleSuiteMap, Map[Id, Set[Rule]])

    Returns an integrated ruleSuiteMap with a set of RuleSuite Id -> Rule mappings where the OutputExpression didn't exist.

    Returns an integrated ruleSuiteMap with a set of RuleSuite Id -> Rule mappings where the OutputExpression didn't exist. If defaultProcessor is expected to be used then call integrateRuleSuites *before*.

    Users should check if their RuleSuite is in the "error" map. The isAMissingRuleSuiteRule can be used to identify if a Rule is referring to a missing RuleSuite.defaultProcessor

    Definition Classes
    SerializingImports
  39. def integrateRuleSuites(ruleSuiteMap: RuleSuiteMap, ruleSuites: Map[Id, RuleSuiteRow]): RuleSuiteMap

    Returns an integrated ruleSuiteMap with probablePass and defaultOutputExpression applied

    Returns an integrated ruleSuiteMap with probablePass and defaultOutputExpression applied

    Call this before calling integrateOutputExpressions.

    Definition Classes
    SerializingImports
  40. def isAMissingRuleSuiteRule(rule: Rule): Boolean

    Identify if the missing OutputExpression from a RuleSuite is from a defaultProcessor

    Identify if the missing OutputExpression from a RuleSuite is from a defaultProcessor

    Definition Classes
    SerializingImports
  41. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  42. def loadMapConfigs(loader: DataFrameLoader, viewDF: DataFrame, name: Column, token: Column, filter: Column, sql: Column, key: Column, value: Column): (Seq[MapConfig], Set[String])

    Loads map configurations from a given DataFrame.

    Loads map configurations from a given DataFrame. Wherever token is present loader will be called and the filter optionally applied.

    returns

    A tuple of MapConfig's and the names of rows which had unexpected content (either token or sql must be present)

    Definition Classes
    MapLookupImportsShared
  43. def loadMapConfigs(loader: DataFrameLoader, viewDF: DataFrame, ruleSuiteIdColumn: Column, ruleSuiteVersionColumn: Column, ruleSuiteId: Id, name: Column, token: Column, filter: Column, sql: Column, key: Column, value: Column): (Seq[MapConfig], Set[String])

    Loads map configurations from a given DataFrame for ruleSuiteId.

    Loads map configurations from a given DataFrame for ruleSuiteId. Wherever token is present loader will be called and the filter optionally applied.

    returns

    A tuple of MapConfig's and the names of rows which had unexpected content (either token or sql must be present)

    Definition Classes
    MapLookupImportsShared
  44. def loadMaps(configs: Seq[MapConfig]): MapLookups
    Definition Classes
    MapLookupImportsShared
  45. def loadViewConfigs(loader: DataFrameLoader, viewDF: DataFrame, name: Column, token: Column, filter: Column, sql: Column): (Seq[ViewConfig], Set[String])

    Loads view configurations from a given DataFrame.

    Loads view configurations from a given DataFrame. Wherever token is present loader will be called and the filter optionally applied.

    returns

    A tuple of ViewConfig's and the names of rows which had unexpected content (either token or sql must be present)

    Definition Classes
    ViewLoading
  46. def loadViewConfigs(loader: DataFrameLoader, viewDF: DataFrame, ruleSuiteIdColumn: Column, ruleSuiteVersionColumn: Column, ruleSuiteId: Id, name: Column, token: Column, filter: Column, sql: Column): (Seq[ViewConfig], Set[String])

    Loads view configurations from a given DataFrame for ruleSuiteId.

    Loads view configurations from a given DataFrame for ruleSuiteId. Wherever token is present loader will be called and the filter optionally applied.

    returns

    A tuple of ViewConfig's and the names of rows which had unexpected content (either token or sql must be present)

    Definition Classes
    ViewLoading
  47. def loadViews(viewConfigs: Seq[ViewConfig]): ViewLoadResults

    Attempts to load all the views present in the config.

    Attempts to load all the views present in the config. If a view is already registered in that name it will be replaced.

    returns

    the names of views which have been replaced

    Definition Classes
    ViewLoading
  48. def mapLookupsFromDFs(creators: Map[String, MapCreator]): MapLookups

    Loads maps to broadcast, each individual dataframe may have different associated expressions

    Loads maps to broadcast, each individual dataframe may have different associated expressions

    creators

    a map of string id to MapCreator

    returns

    a map of id to broadcast variables needed for exact lookup and mapping checks

    Definition Classes
    MapLookupImportsShared
  49. def process_if_attribute_missing(ruleSuite: Column): String

    Processes a given RuleSuite to replace any coalesceIfMissingAttributes.

    Processes a given RuleSuite to replace any coalesceIfMissingAttributes. This may be called before validate / docs but *must* be called *before* adding the ruleSuite to a dataframe.

    ruleSuite

    a Column representing a rule suite from register_rule_suite_variable

    Definition Classes
    VariableProcessIfMissing
  50. def process_if_attribute_missing(ruleSuite: Column, schema: StructType): String

    Processes a given RuleSuite to replace any coalesceIfMissingAttributes.

    Processes a given RuleSuite to replace any coalesceIfMissingAttributes. This may be called before validate / docs but *must* be called *before* adding the ruleSuite to a dataframe.

    ruleSuite

    a Column representing a rule suite from register_rule_suite_variable

    schema

    The names to validate against, if empty no attempt to process coalesceIfAttributeMissing will be made

    Definition Classes
    VariableProcessIfMissing
  51. def process_if_attribute_missing(ruleSuite: Column, schema: StructType, stableName: String): String

    Processes a given RuleSuite to replace any coalesceIfMissingAttributes.

    Processes a given RuleSuite to replace any coalesceIfMissingAttributes. This may be called before validate / docs but *must* be called *before* adding the ruleSuite to a dataframe.

    ruleSuite

    a Column representing a rule suite from register_rule_suite_variable

    schema

    The names to validate against, if empty no attempt to process coalesceIfAttributeMissing will be made

    Definition Classes
    VariableProcessIfMissing
  52. def process_if_attribute_missing_col(ruleSuite: Column, schema: StructType, stableName: String): Column

    Processes a given RuleSuite to replace any coalesceIfMissingAttributes.

    Processes a given RuleSuite to replace any coalesceIfMissingAttributes. This may be called before validate / docs but *must* be called *before* adding the ruleSuite to a dataframe.

    ruleSuite

    a Column representing a rule suite from register_rule_suite_variable

    schema

    The names to validate against, if empty no attempt to process coalesceIfAttributeMissing will be made

    Definition Classes
    VariableProcessIfMissing
  53. def readLambdaRowsFromDF(lambdaFunctionDF: DataFrame, lambdaFunctionName: Column, lambdaFunctionExpression: Column, lambdaFunctionId: Column, lambdaFunctionVersion: Column, lambdaFunctionRuleSuiteId: Column, lambdaFunctionRuleSuiteVersion: Column): Dataset[LambdaFunctionRow]

    Loads lambda functions

    Loads lambda functions

    Definition Classes
    SerializingImports
  54. def readLambdasFromDF(lambdaFunctionDF: DataFrame, lambdaFunctionName: Column, lambdaFunctionExpression: Column, lambdaFunctionId: Column, lambdaFunctionVersion: Column, lambdaFunctionRuleSuiteId: Column, lambdaFunctionRuleSuiteVersion: Column): Map[Id, Seq[LambdaFunction]]

    Loads lambda functions

    Loads lambda functions

    Definition Classes
    SerializingImports
  55. def readMetaRuleSetsFromDF(metaRuleSetDF: DataFrame, columnSelectionExpression: Column, lambdaFunctionExpression: Column, metaRuleSetId: Column, metaRuleSetVersion: Column, metaRuleSuiteId: Column, metaRuleSuiteVersion: Column): Map[Id, Seq[MetaRuleSetRow]]
    Definition Classes
    SerializingImports
  56. def readOutputExpressionRowsFromDF(outputExpressionDF: DataFrame, outputExpression: Column, outputExpressionId: Column, outputExpressionVersion: Column, outputExpressionRuleSuiteId: Column, outputExpressionRuleSuiteVersion: Column): Dataset[OutputExpressionRow]

    Loads output expressions

    Loads output expressions

    Definition Classes
    SerializingImports
  57. def readOutputExpressionsFromDF(outputExpressionDF: DataFrame, outputExpression: Column, outputExpressionId: Column, outputExpressionVersion: Column, outputExpressionRuleSuiteId: Column, outputExpressionRuleSuiteVersion: Column): Map[Id, Seq[OutputExpressionRow]]

    Loads output expressions

    Loads output expressions

    Definition Classes
    SerializingImports
  58. def readRuleRowsFromDF(df: DataFrame, ruleSuiteId: Column, ruleSuiteVersion: Column, ruleSetId: Column, ruleSetVersion: Column, ruleId: Column, ruleVersion: Column, ruleExpr: Column, outputExpressionSalience: Column, outputExpressionId: Column, outputExpressionVersion: Column): Dataset[RuleRow]

    Loads RuleRows from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Loads RuleRows from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Definition Classes
    SerializingImports
  59. def readRuleRowsFromDF(df: DataFrame, ruleSuiteId: Column, ruleSuiteVersion: Column, ruleSetId: Column, ruleSetVersion: Column, ruleId: Column, ruleVersion: Column, ruleExpr: Column, ruleEngine: Option[(Column, Column, Column)] = None): Dataset[RuleRow]

    Loads RuleRows from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Loads RuleRows from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Definition Classes
    SerializingImports
  60. def readRuleSuitesFromDF(ruleSuites: Dataset[RuleSuiteRow]): Map[Id, RuleSuiteRow]

    Loads RuleSuite specific attributes to use with integrateRuleSuites

    Loads RuleSuite specific attributes to use with integrateRuleSuites

    Definition Classes
    SerializingImports
  61. def readRulesFromDF(df: DataFrame, ruleSuiteId: Column, ruleSuiteVersion: Column, ruleSetId: Column, ruleSetVersion: Column, ruleId: Column, ruleVersion: Column, ruleExpr: Column, ruleEngineSalience: Column, ruleEngineId: Column, ruleEngineVersion: Column): RuleSuiteMap

    Loads a RuleSuite from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Loads a RuleSuite from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Definition Classes
    SerializingImports
  62. def readRulesFromDF(df: DataFrame, ruleSuiteId: Column, ruleSuiteVersion: Column, ruleSetId: Column, ruleSetVersion: Column, ruleId: Column, ruleVersion: Column, ruleExpr: Column): RuleSuiteMap

    Loads a RuleSuite from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Loads a RuleSuite from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Definition Classes
    SerializingImports
  63. def registerLambdaFunctions(functions: Seq[LambdaFunction]): Unit

    This function is for use with directly loading lambda functions to use within normal spark apps.

    This function is for use with directly loading lambda functions to use within normal spark apps.

    Please be aware loading rules via ruleRunner, addOverallResultsAndDetails or addDataQuality will also load any registered rules applicable for that ruleSuite and may override other registered rules. You should not use this directly and RuleSuites for that reason (otherwise your results will not be reproducible).

    Definition Classes
    LambdaFunctionsImports
  64. def registerQualityFunctions(): Unit

    Simplified registerQualityFunctions, use classicFunction import when the other features are needed.

    Simplified registerQualityFunctions, use classicFunction import when the other features are needed.

    Must be called before using any functions like Passed, Failed or Probability(X) when using classic, a no-op when using connect with the SparkSessionExtension

  65. def registerTempViewNameFromDS[T](lambdaFunctionRows: Option[Dataset[T]]): String
    Attributes
    protected
    Definition Classes
    VersionSpecificSerializingImports
  66. def register_rule_suite(ruleSuite: RuleSuite): String

    Registers a ruleSuite directly as an Spark Variable (with object stream encoding)

    Registers a ruleSuite directly as an Spark Variable (with object stream encoding)

    Definition Classes
    VersionSpecificSerializingImports
  67. def register_rule_suite(ruleSuite: RuleSuite, stableName: String): String

    Registers a ruleSuite directly as a Spark Variable (with object stream encoding).

    Registers a ruleSuite directly as a Spark Variable (with object stream encoding). Where possible using the CombinedRuleSuiteRows should be preferred and manage the ruleSuites on the server.

    Definition Classes
    VersionSpecificSerializingImports
  68. def register_rule_suite_group(group: RuleSuiteGroup, stableName: String): String

    Registers the RuleSuiteGroup with a Spark Variable with the provided stable id

    Registers the RuleSuiteGroup with a Spark Variable with the provided stable id

    returns

    stableName

    Definition Classes
    VersionSpecificSerializingImports
  69. def register_rule_suite_group(group: RuleSuiteGroup): String

    Registers the RuleSuiteGroup with a Spark Variable with the provided stable id

    Registers the RuleSuiteGroup with a Spark Variable with the provided stable id

    returns

    stableName

    Definition Classes
    VersionSpecificSerializingImports
  70. def register_rule_suite_group_variable(ds: Dataset[CombinedRuleSuiteRows], stableName: String): String

    Registers the RuleSuiteGroup represented by the CombinedRuleSuiteRows with a Spark Variable with the provided stable id.

    Registers the RuleSuiteGroup represented by the CombinedRuleSuiteRows with a Spark Variable with the provided stable id.

    returns

    stableName

    Definition Classes
    VersionSpecificSerializingImports
  71. def register_rule_suite_group_variable(ds: Dataset[CombinedRuleSuiteRows]): String

    Registers the RuleSuiteGroup represented by the CombinedRuleSuiteRows with a Spark Variable with the provided stable id

    Registers the RuleSuiteGroup represented by the CombinedRuleSuiteRows with a Spark Variable with the provided stable id

    returns

    stableName

    Definition Classes
    VersionSpecificSerializingImports
  72. def register_rule_suite_variable(ds: Dataset[CombinedRuleSuiteRows], id: VersionedId, stableName: String): String

    Registers the RuleSuite with a Spark Variable with the provided stable id

    Registers the RuleSuite with a Spark Variable with the provided stable id

    returns

    stableName

    Definition Classes
    VersionSpecificSerializingImports
  73. def register_rule_suite_variable(ds: Dataset[CombinedRuleSuiteRows], id: VersionedId): String

    Registers the RuleSuite with a Spark Variable with a unique_id

    Registers the RuleSuite with a Spark Variable with a unique_id

    returns

    the variable name

    Definition Classes
    VersionSpecificSerializingImports
  74. def ruleEngineRunner(ruleSuite: RuleSuite, resultDataType: DataType): Column

    Creates a column that runs the RuleSuite.

    Creates a column that runs the RuleSuite. This also forces registering the lambda functions used by that RuleSuite

    ruleSuite

    The ruleSuite with runOnPassProcessors

    resultDataType

    The type of the results from runOnPassProcessors - must be the same for all result types, by default most fields will be nullable and encoding must follow the fields when not specified. *

    returns

    A Column representing the QualityRules expression built from this ruleSuite

    Definition Classes
    RuleEngineRunnerImports
  75. def ruleEngineRunner(ruleSuite: RuleSuite, resultDataType: DataType, debugMode: Boolean): Column

    Creates a column that runs the RuleSuite.

    Creates a column that runs the RuleSuite. This also forces registering the lambda functions used by that RuleSuite

    ruleSuite

    The ruleSuite with runOnPassProcessors

    resultDataType

    The type of the results from runOnPassProcessors - must be the same for all result types, by default most fields will be nullable and encoding must follow the fields when not specified. *

    debugMode

    When debugMode is enabled the resultDataType is wrapped in Array of (salience, result) pairs to ease debugging

    returns

    A Column representing the QualityRules expression built from this ruleSuite

    Definition Classes
    RuleEngineRunnerImports
  76. def ruleEngineRunner(ruleSuite: RuleSuite, resultDataType: Option[DataType] = None, debugMode: Boolean = false, variablesPerFunc: Int = 40, variableFuncGroup: Int = 20, forceRunnerEval: Boolean = false, forceTriggerEval: Boolean = false): Column

    Creates a column that runs the RuleSuite.

    Creates a column that runs the RuleSuite. This also forces registering the lambda functions used by that RuleSuite

    ruleSuite

    The ruleSuite with runOnPassProcessors

    resultDataType

    The type of the results from runOnPassProcessors - must be the same for all result types, by default most fields will be nullable and encoding must follow the fields when not specified.

    debugMode

    When debugMode is enabled the resultDataType is wrapped in Array of (salience, result) pairs to ease debugging

    variablesPerFunc

    Defaulting to 40 allows, in combination with variableFuncGroup allows customisation of handling the 64k jvm method size limitation when performing WholeStageCodeGen

    variableFuncGroup

    Defaulting to 20

    returns

    A Column representing the QualityRules expression built from this ruleSuite

    Definition Classes
    RuleEngineRunnerImports
  77. def ruleEngineWithStruct(dataFrame: DataFrame, rules: RuleSuite, outputType: DataType, ruleEngineFieldName: String = "ruleEngine", alias: String = "main", debugMode: Boolean = false): DataFrame

    Leverages the ruleRunner to produce an new output structure, the outputType defines the output structure generated.

    Leverages the ruleRunner to produce an new output structure, the outputType defines the output structure generated.

    This version should only be used when you require select(*, ruleRunner) to be used, it requires you fully specify types.

    dataFrame

    the input dataframe

    outputType

    The fields, and types, are used to call the foldRunner. These types must match in the input fields

    ruleEngineFieldName

    The field name the results will be stored in, by default ruleEngine

    alias

    sets the alias to use for dataFrame when using subqueries to resolve ambiguities, setting to an empty string (or null) will not assign an alias

    Definition Classes
    AddDataFunctionsImports
  78. def ruleEngineWithStructF[P[R] >: Dataset[R]](rules: RuleSuite, outputType: DataType, ruleEngineFieldName: String = "ruleEngine", alias: String = "main", debugMode: Boolean = false): (P[Row]) => P[Row]

    Leverages the ruleRunner to produce an new output structure, the outputType defines the output structure generated.

    Leverages the ruleRunner to produce an new output structure, the outputType defines the output structure generated.

    This version should only be used when you require select(*, ruleRunner) to be used, it requires you fully specify types.

    outputType

    The fields, and types, are used to call the foldRunner. These types must match in the input fields

    Definition Classes
    AddDataFunctionsImports
  79. def ruleEngineWithStructFOT[P[R] >: Dataset[R]](rules: RuleSuite, outputType: Option[DataType] = None, ruleEngineFieldName: String = "ruleEngine", alias: String = "main", debugMode: Boolean = false): (P[Row]) => P[Row]

    Optional outputType, when none attempts will be made by spark to derive the output type based on the first output.

    Optional outputType, when none attempts will be made by spark to derive the output type based on the first output. This is often correct, but nullablity and ordering of fields are known problems, use the output type to correct them.

    Leverages the ruleRunner to produce an new output structure, the outputType defines the output structure generated.

    This version should only be used when you require select(*, ruleRunner) to be used, it requires you fully specify types.

    outputType

    The fields, and types, are used to call the foldRunner. These types must match in the input fields

    Definition Classes
    AddDataFunctionsImports
  80. def ruleEngineWithStructOT(dataFrame: DataFrame, rules: RuleSuite, outputType: Option[DataType] = None, ruleEngineFieldName: String = "ruleEngine", alias: String = "main", debugMode: Boolean = false): DataFrame

    Optional outputType, when none attempts will be made by spark to derive the output type based on the first output.

    Optional outputType, when none attempts will be made by spark to derive the output type based on the first output. This is often correct, but nullablity and ordering of fields are known problems, use the output type to correct them.

    Leverages the ruleRunner to produce an new output structure, the outputType defines the output structure generated.

    This version should only be used when you require select(*, ruleRunner) to be used, it requires you fully specify types.

    dataFrame

    the input dataframe

    outputType

    The fields, and types, are used to call the foldRunner. These types must match in the input fields

    ruleEngineFieldName

    The field name the results will be stored in, by default ruleEngine

    alias

    sets the alias to use for dataFrame when using subqueries to resolve ambiguities, setting to an empty string (or null) will not assign an alias

    Definition Classes
    AddDataFunctionsImports
  81. def ruleFolderRunner(ruleSuite: RuleSuite, startingStruct: Column, debugMode: Boolean = false, variablesPerFunc: Int = 40, variableFuncGroup: Int = 20, useType: Option[StructType] = None): Column

    Creates a column that runs the folding RuleSuite.

    Creates a column that runs the folding RuleSuite. This also forces registering the lambda functions used by that RuleSuite.

    FolderRunner runs all output expressions for matching rules in order of salience, the startingStruct is passed ot the first matching, the result passed to the second etc. In contrast to ruleEngineRunner OutputExpressions should be lambdas with one parameter, that of the structure

    ruleSuite

    The ruleSuite with runOnPassProcessors

    startingStruct

    This struct is passed to the first matching rule, ideally you would use the spark dsl struct function to refer to existing columns

    debugMode

    When debugMode is enabled the resultDataType is wrapped in Array of (salience, result) pairs to ease debugging

    variablesPerFunc

    Defaulting to 40 allows, in combination with variableFuncGroup allows customisation of handling the 64k jvm method size limitation when performing WholeStageCodeGen

    variableFuncGroup

    Defaulting to 20

    useType

    In the case you must use select and can't use withColumn you may provide a type directly to stop the NPE

    returns

    A Column representing the QualityRules expression built from this ruleSuite

    Definition Classes
    RuleFolderRunnerImports
  82. def ruleRunner(ruleSuite: RuleSuite, variablesPerFunc: Int = 40, variableFuncGroup: Int = 20): Column

    Creates a column that runs the RuleSuite suitable for DQ / Validation.

    Creates a column that runs the RuleSuite suitable for DQ / Validation. This also forces registering the lambda functions used by that RuleSuite This forwards to the original ruleRunner via dqRuleRunner

    ruleSuite

    The Qualty RuleSuite to evaluate

    variablesPerFunc

    Defaulting to 40, it allows, in combination with variableFuncGroup customisation of handling the 64k jvm method size limitation when performing WholeStageCodeGen. You _shouldn't_ need it but it's there just in case.

    variableFuncGroup

    Defaulting to 20

    returns

    A Column representing the Quality DQ expression built from this ruleSuite

    Definition Classes
    RuleRunnerImports
  83. def rule_suite(ds: Dataset[CombinedRuleSuiteRows], id: VersionedId): Option[RuleSuite]

    Returns a specific RuleSuite from available rule suites

    Returns a specific RuleSuite from available rule suites

    Definition Classes
    VersionSpecificSerializingImports
  84. def rule_suite(row: CombinedRuleSuiteRows): RuleSuite

    Returns a specific RuleSuite from available rule suites

    Returns a specific RuleSuite from available rule suites

    Definition Classes
    VersionSpecificSerializingImports
  85. def rule_suite_from(ruleSuiteGroupName: String, ruleSuiteId: Int, ruleSuiteVersion: Int): Column

    Retrieves the RuleSuite from the RuleSuiteGroup backed Spark Session variable name against the specific ruleSuiteId and ruleSuiteVersion.

    Retrieves the RuleSuite from the RuleSuiteGroup backed Spark Session variable name against the specific ruleSuiteId and ruleSuiteVersion. Null is returned if no matching entry is found

    Definition Classes
    VersionSpecificSerializingImports
  86. def rule_suite_from(ruleSuiteGroupName: String, ruleSuiteId: Int): Column

    Retrieves the RuleSuite from the RuleSuiteGroup backed Spark Session variable name against the highest version with the ruleSuiteId.

    Retrieves the RuleSuite from the RuleSuiteGroup backed Spark Session variable name against the highest version with the ruleSuiteId. Null is returned if no matching entry is found

    Definition Classes
    VersionSpecificSerializingImports
  87. def rule_suite_group(ds: Dataset[CombinedRuleSuiteRows]): RuleSuiteGroup

    Converts CombinedRuleSuiteRows into a RuleSuiteGroup

    Converts CombinedRuleSuiteRows into a RuleSuiteGroup

    returns

    stableName

    Definition Classes
    VersionSpecificSerializingImports
  88. def rule_suite_group(combined: Seq[CombinedRuleSuiteRows]): RuleSuiteGroup

    Converts CombinedRuleSuiteRows into a RuleSuiteGroup

    Converts CombinedRuleSuiteRows into a RuleSuiteGroup

    returns

    stableName

    Definition Classes
    VersionSpecificSerializingImports
  89. def toDS(ruleSuite: RuleSuite): Dataset[RuleRow]

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    returns

    a Dataset[RowRaw] with the rules flattened out

    Definition Classes
    SerializingImports
  90. def toLambdaDS(ruleSuite: RuleSuite): Dataset[LambdaFunctionRow]

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    returns

    a Dataset[RowRaw] with the rules flattened out

    Definition Classes
    SerializingImports
  91. def toOutputExpressionDS(ruleSuite: RuleSuite): Dataset[OutputExpressionRow]

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    returns

    a Dataset[RowRaw] with the rules flattened out

    Definition Classes
    SerializingImports
  92. def toRuleSuiteDF(ruleSuite: RuleSuite): DataFrame

    Utility function to easy dealing with simple DQ rules where the rule engine functionality is ignored.

    Utility function to easy dealing with simple DQ rules where the rule engine functionality is ignored. This can be paired with the default ruleEngine parameter in readRulesFromDF

    Definition Classes
    SerializingImports
  93. def toRuleSuiteRow(ruleSuite: RuleSuite): (RuleSuiteRow, Option[OutputExpressionRow])

    Creates a RuleSuiteRow from a RuleSuite for RuleSuite specific attributes and an optional defaultProcessor

    Creates a RuleSuiteRow from a RuleSuite for RuleSuite specific attributes and an optional defaultProcessor

    Definition Classes
    SerializingImports
  94. def toString(): String
    Definition Classes
    Any
  95. def typedExpressionRunner(ruleSuite: RuleSuite, ddlType: String, name: String = "expressionResults"): Column

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: String, resultDDL: String)

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: String, resultDDL: String)

    Definition Classes
    ExpressionRunnerImports
  96. object BloomModel extends Serializable
  97. object DefaultProcessor extends Serializable
  98. case object DefaultRule extends RuleResult with Product with Serializable

    Returned for RuleSuiteResult.overallResult when no other trigger rule has passed and a defaultProcessor has been configured (otherwise it's Failed)

    Returned for RuleSuiteResult.overallResult when no other trigger rule has passed and a defaultProcessor has been configured (otherwise it's Failed)

    Annotations
    @SerialVersionUID()
  99. case object DisabledRule extends RuleResult with Product with Serializable

    This shouldn't evaluate to a fail, allows signalling a rule has been disabled

    This shouldn't evaluate to a fail, allows signalling a rule has been disabled

    Annotations
    @SerialVersionUID()
  100. object ExpressionRule extends Serializable
  101. case object Failed extends RuleResult with Product with Serializable
    Annotations
    @SerialVersionUID()
  102. case object IgnoredRule extends RuleResult with Product with Serializable

    This shouldn't evaluate to a fail, allows signalling a rule has been ignored

    This shouldn't evaluate to a fail, allows signalling a rule has been ignored

    Annotations
    @SerialVersionUID()
  103. object LambdaFunction extends Serializable
  104. object NoOpDefaultProcessor
  105. object NoOpRunOnPassProcessor
  106. object OutputExpression
  107. case object Passed extends RuleResult with Product with Serializable
    Annotations
    @SerialVersionUID()
  108. object QualityException extends Serializable
  109. object RegistrationFunction
  110. object ResultStatisticsProvider extends Serializable
  111. object RuleModel
  112. object RuleSuite extends Serializable
  113. object RuleSuiteGroup extends Serializable
  114. object RuleSuiteGroupResults extends Serializable
  115. object RuleSuiteResultDetails extends Serializable
  116. object RunOnPassProcessor extends Serializable
  117. case object SoftFailed extends RuleResult with Product with Serializable

    This shouldn't evaluate to a fail, think of it as Amber / Warn

    This shouldn't evaluate to a fail, think of it as Amber / Warn

    Annotations
    @SerialVersionUID()

Inherited from CollectRunnerImports

Inherited from ViewLoading

Inherited from LambdaFunctionsImports

Inherited from SerializingImports

Inherited from MapLookupImportsShared

Inherited from Serializable

Inherited from RuleRunnerImports

Inherited from AnyRef

Inherited from Any

Ungrouped