Packages

p

com.sparkutils

quality

package quality

Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. quality
  2. ExpressionRunnerImports
  3. ViewLoading
  4. RuleFolderRunnerImports
  5. ProcessDisableIfMissingImports
  6. ValidationImports
  7. RuleEngineRunnerImports
  8. LambdaFunctionsImports
  9. AddDataFunctionsImports
  10. SerializingImports
  11. BlockSplitBloomFilterImports
  12. BloomFilterLookupImports
  13. LookupIdFunctionsImports
  14. MapLookupImports
  15. Serializable
  16. Serializable
  17. RuleRunnerImports
  18. BloomFilterRegistration
  19. RuleRunnerFunctionsImport
  20. BucketedCreatorFunctionImports
  21. BloomFilterTypes
  22. AnyRef
  23. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. type BloomFilterMap = Map[String, (BloomLookup, Double)]

    Used as a param to load the bloomfilter expr

    Used as a param to load the bloomfilter expr

    Definition Classes
    BloomFilterTypes
  2. trait BloomLookup extends AnyRef

    Simple does it contain function to test a bloom

  3. case class BloomModel(rootDir: String, fpp: Double, numBuckets: Int) extends Serializable with Product with Serializable

    Represents the shared file location of a bucked bloom filter.

    Represents the shared file location of a bucked bloom filter. There should be files with names 0..numBuckets containing the same number of bytes representing each bucket.

    rootDir

    The directory which contains each bucket

    fpp

    The fpp for this bloom - note it is informational only and will not be used in further processing

    numBuckets

    The number of buckets within this bloom

  4. trait DataFrameLoader extends AnyRef

    Simple interface to load DataFrames used by map/bloom and view loading

  5. case class ExpressionRule(rule: String) extends ExprLogic with HasRuleText with Product with Serializable

    The result of serializing or loading rules

  6. case class GeneralExpressionResult(result: String, resultDDL: String) extends Product with Serializable

    Represents the expression results of ExpressionRunner

    Represents the expression results of ExpressionRunner

    result

    the result casted to string

    resultDDL

    the result type in ddl

  7. case class GeneralExpressionsResult[R](id: VersionedId, ruleSetResults: Map[VersionedId, Map[VersionedId, R]]) extends Serializable with Product

    Represents the results of the ExpressionRunner

  8. case class GeneralExpressionsResultNoDDL(id: VersionedId, ruleSetResults: Map[VersionedId, Map[VersionedId, String]]) extends Serializable with Product

    Represents the results of the ExpressionRunner after calling strip_result_ddl

  9. case class Id(id: Int, version: Int) extends VersionedId with Product with Serializable

    A versioned rule ID - note the name is never persisted in results, the id and version are sufficient to retrieve the name

    A versioned rule ID - note the name is never persisted in results, the id and version are sufficient to retrieve the name

    id

    a unique ID to identify this rule

    version

    the version of the rule - again tied to ID

  10. type IdTriple = (Id, Id, Id)
    Definition Classes
    LambdaFunctionsImports
  11. case class LazyRuleEngineResult[T](lazyRuleSuiteResults: LazyRuleSuiteResult, salientRule: Option[SalientRule], result: Option[T]) extends Serializable with Product

    Results for all rules run against a DataFrame, the RuleSuiteResult is lazily evaluated.

    Results for all rules run against a DataFrame, the RuleSuiteResult is lazily evaluated. Note in debug mode the type of T must be Seq[(Int, ActualType)]

    lazyRuleSuiteResults

    Overall results from applying the engine, evealuated lazily

    salientRule

    if it's None there is no rule which matched for this row or it's in Debug mode which will return all results.

    result

    The result type for this row, if no rule matched this will be None, if a rule matched but the outputexpression returned null this will also be None

  12. case class LazyRuleFolderResult[T](lazyRuleSuiteResults: LazyRuleSuiteResult, result: Option[T]) extends Serializable with Product

    Results for all rules run against a DataFrame, the RuleSuiteResult is lazily evaluated.

    Results for all rules run against a DataFrame, the RuleSuiteResult is lazily evaluated. Note in debug mode the type of T must be Seq[(Int, ActualType)]

    lazyRuleSuiteResults

    Overall results from applying the engine, evaluated lazily

    result

    The result type for this row, if no rule matched this will be None, if a rule matched but the outputexpression returned null this will also be None

  13. trait LazyRuleSuiteResult extends Serializable

    A lazy proxy for RuleSuiteResult

  14. trait LazyRuleSuiteResultDetails extends Serializable

    A lazy proxy for RuleSuiteResultDetails

  15. type MapCreator = () ⇒ (DataFrame, Column, Column)
    Definition Classes
    MapLookupImports
  16. type MapLookups = Map[String, (Broadcast[MapData], DataType)]

    Used as a param to load the map lookups - note the type of the broadcast is always Map[AnyRef, AnyRef]

    Used as a param to load the map lookups - note the type of the broadcast is always Map[AnyRef, AnyRef]

    Definition Classes
    MapLookupImports
  17. case class OutputExpression(rule: String) extends OutputExprLogic with HasRuleText with Logging with Product with Serializable

    Used as a result of serializing

  18. case class OverallResult(probablePass: Double = 0.8, currentResult: RuleResult = Passed) extends Product with Serializable

    Probability is evaluated at over probablePass percent, defaults to 80% 0.8.

    Probability is evaluated at over probablePass percent, defaults to 80% 0.8. Passed until any failure occurs

  19. case class Probability(percentage: Double) extends RuleResult with Product with Serializable

    0-1 with 1 being absolutely likely a pass

  20. case class QualityException(msg: String, cause: Exception = null) extends RuntimeException with Product with Serializable

    Simple marker instead of sys.error

  21. case class Rule(id: Id, expression: RuleLogic, runOnPassProcessor: RunOnPassProcessor = NoOpRunOnPassProcessor.noOp) extends Serializable with Product

    A rule to run over a row

  22. case class RuleEngineResult[T](ruleSuiteResults: RuleSuiteResult, salientRule: Option[SalientRule], result: Option[T]) extends Serializable with Product

    Results for all rules run against a DataFrame.

    Results for all rules run against a DataFrame. Note in debug mode the type of T must be Seq[(Int, ActualType)]

    ruleSuiteResults

    Overall results from applying the engine

    salientRule

    if it's None there is no rule which matched for this row or it's in Debug mode which will return all results.

    result

    The result type for this row, if no rule matched this will be None, if a rule matched but the outputexpression returned null this will also be None

  23. case class RuleFolderResult[T](ruleSuiteResults: RuleSuiteResult, result: Option[T]) extends Serializable with Product

    Results for all rules run against a DataFrame.

    Results for all rules run against a DataFrame. Note in debug mode the type of T must be Seq[(Int, ActualType)]

    ruleSuiteResults

    Overall results from applying the engine

    result

    The result type for this row, if no rule matched this will be None, if a rule matched but the outputexpression returned null this will also be None

  24. sealed trait RuleResult extends Serializable
  25. case class RuleResultWithProcessor(ruleResult: RuleResult, runOnPassProcessor: RunOnPassProcessor) extends RuleResult with Product with Serializable

    Packs a rule result with a RunOnPassProcessor processor

  26. case class RuleSet(id: Id, rules: Seq[Rule]) extends Serializable with Product
  27. case class RuleSetResult(overallResult: RuleResult, ruleResults: Map[VersionedId, RuleResult]) extends Serializable with Product

    Result collection for a number of rules

    Result collection for a number of rules

    ruleResults

    rule id -> ruleresult

  28. case class RuleSuite(id: Id, ruleSets: Seq[RuleSet], lambdaFunctions: Seq[LambdaFunction] = Seq.empty, probablePass: Double = 0.8) extends Serializable with Product

    Represents a versioned collection of RuleSet's

    Represents a versioned collection of RuleSet's

    probablePass

    override to specify a different percentage for treating probability results as passes - defaults to 80% (0.8)

  29. case class RuleSuiteResult(id: VersionedId, overallResult: RuleResult, ruleSetResults: Map[VersionedId, RuleSetResult]) extends Serializable with Product

    Results for all rules run against a dataframe

    Results for all rules run against a dataframe

    id

    - the Id of the suite, all other content is mapped

  30. case class RuleSuiteResultDetails(id: VersionedId, ruleSetResults: Map[VersionedId, RuleSetResult]) extends Serializable with Product

    Results for all rules run against a dataframe without the overallResult.

    Results for all rules run against a dataframe without the overallResult. Performance differences for filtering on top level fields are significant over nested structures even under Spark 3, in the region of 30-50% depending on op.

  31. case class SalientRule(ruleSuiteId: VersionedId, ruleSetId: VersionedId, ruleId: VersionedId) extends Product with Serializable

    Represents the rule that matched a given RuleEngine result

  32. case class SparkBloomFilter(bloom: BloomFilter) extends BloomLookup with Product with Serializable

Abstract Value Members

  1. abstract def getClass(): Class[_]
    Definition Classes
    Any

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    Any
  2. final def ##(): Int
    Definition Classes
    Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    Any
  4. val DisabledRuleInt: Int

    The integer value for disabled dq rules

    The integer value for disabled dq rules

    Definition Classes
    RuleRunnerImports
  5. val FailedInt: Int

    The integer value for failed dq rules

    The integer value for failed dq rules

    Definition Classes
    RuleRunnerImports
  6. val PassedInt: Int

    The integer value for passed dq rules

    The integer value for passed dq rules

    Definition Classes
    RuleRunnerImports
  7. val SoftFailedInt: Int

    The integer value for soft failed dq rules

    The integer value for soft failed dq rules

    Definition Classes
    RuleRunnerImports
  8. def addDataQuality(dataFrame: DataFrame, rules: RuleSuite, name: String = "DataQuality", compileEvals: Boolean = false, forceRunnerEval: Boolean = false): DataFrame

    Adds a DataQuality field using the RuleSuite and RuleSuiteResult structure

    Adds a DataQuality field using the RuleSuite and RuleSuiteResult structure

    Definition Classes
    AddDataFunctionsImports
  9. def addDataQualityF(rules: RuleSuite, name: String = "DataQuality", compileEvals: Boolean = false, forceRunnerEval: Boolean = false): (DataFrame) ⇒ DataFrame

    Adds a DataQuality field using the RuleSuite and RuleSuiteResult structure for use with dataset.transform functions

    Adds a DataQuality field using the RuleSuite and RuleSuiteResult structure for use with dataset.transform functions

    Definition Classes
    AddDataFunctionsImports
  10. def addExpressionRunner(dataFrame: DataFrame, ruleSuite: RuleSuite, name: String = "expressionResults", renderOptions: Map[String, String] = Map.empty, ddlType: String = "", forceRunnerEval: Boolean = false, compileEvals: Boolean = false, stripDDL: Boolean = false): DataFrame

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: yaml, resultType: String) Supplying a ddlType triggers the output type for the expression to be that ddl type, rather than using yaml conversion.

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: yaml, resultType: String) Supplying a ddlType triggers the output type for the expression to be that ddl type, rather than using yaml conversion.

    name

    the default column name "expressionResults"

    renderOptions

    provides rendering options to the underlying snake yaml implementation

    ddlType

    optional DDL string, when present yaml output is disabled and the output expressions must all have the same type

    stripDDL

    the default of false leaves the DDL present, true strips ddl from the output using strip_result_ddl

    Definition Classes
    AddDataFunctionsImports
  11. def addExpressionRunnerF[P[R] >: DatasetBase[R]](ruleSuite: RuleSuite, name: String = "expressionResults", renderOptions: Map[String, String] = Map.empty, ddlType: String = "", forceRunnerEval: Boolean = false, compileEvals: Boolean = false, stripDDL: Boolean = false): (P[Row]) ⇒ P[Row]

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: yaml, resultType: String) Supplying a ddlType triggers the output type for the expression to be that ddl type, rather than using yaml conversion.

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: yaml, resultType: String) Supplying a ddlType triggers the output type for the expression to be that ddl type, rather than using yaml conversion.

    name

    the default column name "expressionResults"

    renderOptions

    provides rendering options to the underlying snake yaml implementation

    ddlType

    optional DDL string, when present yaml output is disabled and the output expressions must all have the same type

    stripDDL

    the default of false leaves the DDL present, true strips ddl from the output using strip_result_ddl

    Definition Classes
    AddDataFunctionsImports
  12. def addOverallResultsAndDetails(dataFrame: DataFrame, rules: RuleSuite, overallResult: String = "DQ_overallResult", resultDetails: String = "DQ_Details", compileEvals: Boolean = false, forceRunnerEval: Boolean = false): DataFrame

    Adds two columns, one for overallResult and the other the details, allowing 30-50% performance gains for simple filters

    Adds two columns, one for overallResult and the other the details, allowing 30-50% performance gains for simple filters

    Definition Classes
    AddDataFunctionsImports
  13. def addOverallResultsAndDetailsF(rules: RuleSuite, overallResult: String = "DQ_overallResult", resultDetails: String = "DQ_Details", compileEvals: Boolean = false, forceRunnerEval: Boolean = false): (DataFrame) ⇒ DataFrame

    Adds two columns, one for overallResult and the other the details, allowing 30-50% performance gains for simple filters, for use in dataset.transform functions

    Adds two columns, one for overallResult and the other the details, allowing 30-50% performance gains for simple filters, for use in dataset.transform functions

    Definition Classes
    AddDataFunctionsImports
  14. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  15. val bloomFileLocation: String

    Where the bloom filter files will be stored

    Where the bloom filter files will be stored

    Definition Classes
    BucketedCreatorFunctionImports
  16. def bloomFrom(dataFrame: DataFrame, bloomOn: String, expectedSize: Int, fpp: Double, bloomId: String, partitions: Int): BloomModel

    Generates a bloom filter using the expression passed via bloomOn with optional repartitioning and a default fpp of 0.01.

    Generates a bloom filter using the expression passed via bloomOn with optional repartitioning and a default fpp of 0.01. A unique run will be created so other results can co-exist but this can be overridden by specifying bloomId. The other interim result files will be removed and the resulting BucketedFiles can be used in lookups or stored elsewhere.

    bloomOn

    a compatible expression to generate the bloom hashes against

    Definition Classes
    BucketedCreatorFunctionImports
  17. def bloomFrom(dataFrame: DataFrame, bloomOn: String, expectedSize: Int, fpp: Double, bloomId: String): BloomModel

    Generates a bloom filter using the expression passed via bloomOn with optional repartitioning and a default fpp of 0.01.

    Generates a bloom filter using the expression passed via bloomOn with optional repartitioning and a default fpp of 0.01. A unique run will be created so other results can co-exist but this can be overridden by specifying bloomId. The other interim result files will be removed and the resulting BucketedFiles can be used in lookups or stored elsewhere.

    bloomOn

    a compatible expression to generate the bloom hashes against

    Definition Classes
    BucketedCreatorFunctionImports
  18. def bloomFrom(dataFrame: DataFrame, bloomOn: String, expectedSize: Int, fpp: Double): BloomModel

    Generates a bloom filter using the expression passed via bloomOn with optional repartitioning and a default fpp of 0.01.

    Generates a bloom filter using the expression passed via bloomOn with optional repartitioning and a default fpp of 0.01. A unique run will be created so other results can co-exist but this can be overridden by specifying bloomId. The other interim result files will be removed and the resulting BucketedFiles can be used in lookups or stored elsewhere.

    bloomOn

    a compatible expression to generate the bloom hashes against

    Definition Classes
    BucketedCreatorFunctionImports
  19. def bloomFrom(dataFrame: DataFrame, bloomOn: String, expectedSize: Int): BloomModel

    Generates a bloom filter using the expression passed via bloomOn with optional repartitioning and a default fpp of 0.01.

    Generates a bloom filter using the expression passed via bloomOn with optional repartitioning and a default fpp of 0.01. A unique run will be created so other results can co-exist but this can be overridden by specifying bloomId. The other interim result files will be removed and the resulting BucketedFiles can be used in lookups or stored elsewhere.

    bloomOn

    a compatible expression to generate the bloom hashes against

    Definition Classes
    BucketedCreatorFunctionImports
  20. def bloomFrom(dataFrame: DataFrame, bloomOn: Column, expectedSize: Int, fpp: Double = 0.01, bloomId: String = ..., partitions: Int = 0): BloomModel

    Generates a bloom filter using the expression passed via bloomOn with optional repartitioning and a default fpp of 0.01.

    Generates a bloom filter using the expression passed via bloomOn with optional repartitioning and a default fpp of 0.01. A unique run will be created so other results can co-exist but this can be overridden by specifying bloomId. The other interim result files will be removed and the resulting BucketedFiles can be used in lookups or stored elsewhere.

    bloomOn

    a compatible expression to generate the bloom hashes against

    Definition Classes
    BucketedCreatorFunctionImports
  21. def bloomLookup(bucketedFiles: BloomModel): BloomLookup

    Creates a very large bloom filter from multiple buckets of 2gb arrays backed by the default Parquet implementation

  22. def bloomLookup(bytes: Array[Byte]): BloomLookup

    Creates a bloom filter from an array of bytes using the default Parquet bloom filter implementation

  23. def enableFunNRewrites(): Unit

    Enables the FunNRewrite optimisation.

    Enables the FunNRewrite optimisation. Where a user configured LambdaFunction does not have nested HigherOrderFunctions, or declares the /* USED_AS_LAMBDA */ comment, the lambda function will be expanded, replacing all LambdaVariables with the input expressions.

    Definition Classes
    RuleRunnerFunctionsImport
  24. def enableOptimizations(extraOptimizations: Seq[org.apache.spark.sql.catalyst.rules.Rule[LogicalPlan]]): Unit

    Enables optimisations via experimental.extraOptimizations, first checking if they are already present and ensuring all are added in order, can be used in place of enableFunNRewrites or additionally, for example adding ConstantFolding may improve performance for given rule types

    Enables optimisations via experimental.extraOptimizations, first checking if they are already present and ensuring all are added in order, can be used in place of enableFunNRewrites or additionally, for example adding ConstantFolding may improve performance for given rule types

    extraOptimizations

    rules to be added in order

    Definition Classes
    RuleRunnerFunctionsImport
  25. def equals(arg0: Any): Boolean
    Definition Classes
    Any
  26. def expressionRunner(ruleSuite: RuleSuite, name: String = "expressionResults", renderOptions: Map[String, String] = Map.empty, forceRunnerEval: Boolean = false): Column
    Definition Classes
    ExpressionRunnerImports
  27. def foldAndReplaceFieldPairs[P[R] >: DatasetBase[R]](rules: RuleSuite, fields: Seq[(String, Column)], foldFieldName: String = "foldedFields", debugMode: Boolean = false, tempFoldDebugName: String = "tempFOLDDEBUG", maintainOrder: Boolean = true, compileEvals: Boolean = false, forceRunnerEval: Boolean = false, forceTriggerEval: Boolean = false): (P[Row]) ⇒ P[Row]

    Leverages the foldRunner to replace fields, the input fields pairs are used to create a structure that the rules fold over.

    Leverages the foldRunner to replace fields, the input fields pairs are used to create a structure that the rules fold over. Any field references in the providing pairs are then dropped from the original Dataframe and added back from the resulting structure.

    NOTE: The field order and types of the original DF will be maintained only when maintainOrder is true. As it requires access to the schema it may incur extra work.

    debugMode

    when true the last results are taken for the replaced fields

    maintainOrder

    when true the schema is used to replace fields in the correct location, when false they are simply appended

    Definition Classes
    AddDataFunctionsImports
  28. def foldAndReplaceFieldPairsWithStruct[P[R] >: DatasetBase[R]](rules: RuleSuite, fields: Seq[(String, Column)], struct: StructType, foldFieldName: String = "foldedFields", debugMode: Boolean = false, tempFoldDebugName: String = "tempFOLDDEBUG", maintainOrder: Boolean = true, compileEvals: Boolean = false, forceRunnerEval: Boolean = false, forceTriggerEval: Boolean = false): (P[Row]) ⇒ P[Row]

    Leverages the foldRunner to replace fields, the input field pairs are used to create a structure that the rules fold over, the type must match the provided struct.

    Leverages the foldRunner to replace fields, the input field pairs are used to create a structure that the rules fold over, the type must match the provided struct. Any field references in the providing pairs are then dropped from the original Dataframe and added back from the resulting structure.

    This version should only be used when you require select(*, foldRunner) to be used, it requires you fully specify types.

    NOTE: The field order and types of the original DF will be maintained only when maintainOrder is true. As it requires access to the schema it may incur extra work.

    struct

    The fields, and types, are used to call the foldRunner. These types must match in the input fields

    debugMode

    when true the last results are taken for the replaced fields

    maintainOrder

    when true the schema is used to replace fields in the correct location, when false they are simply appended

    Definition Classes
    AddDataFunctionsImports
  29. def foldAndReplaceFields[P[R] >: DatasetBase[R]](rules: RuleSuite, fields: Seq[String], foldFieldName: String = "foldedFields", debugMode: Boolean = false, tempFoldDebugName: String = "tempFOLDDEBUG", maintainOrder: Boolean = true, compileEvals: Boolean = false, forceRunnerEval: Boolean = false, forceTriggerEval: Boolean = false): (P[Row]) ⇒ P[Row]

    Leverages the foldRunner to replace fields, the input fields are used to create a structure that the rules fold over.

    Leverages the foldRunner to replace fields, the input fields are used to create a structure that the rules fold over. The fields are then dropped from the original Dataframe and added back from the resulting structure.

    NOTE: The field order and types of the original DF will be maintained only when maintainOrder is true. As it requires access to the schema it may incur extra work.

    debugMode

    when true the last results are taken for the replaced fields

    maintainOrder

    when true the schema is used to replace fields in the correct location, when false they are simply appended

    Definition Classes
    AddDataFunctionsImports
  30. def foldAndReplaceFieldsWithStruct[P[R] >: DatasetBase[R]](rules: RuleSuite, struct: StructType, foldFieldName: String = "foldedFields", debugMode: Boolean = false, tempFoldDebugName: String = "tempFOLDDEBUG", maintainOrder: Boolean = true, compileEvals: Boolean = false, forceRunnerEval: Boolean = false, forceTriggerEval: Boolean = false): (P[Row]) ⇒ P[Row]

    Leverages the foldRunner to replace fields, the input fields are used to create a structure that the rules fold over.

    Leverages the foldRunner to replace fields, the input fields are used to create a structure that the rules fold over. The fields are then dropped from the original Dataframe and added back from the resulting structure.

    This version should only be used when you require select(*, foldRunner) to be used, it requires you fully specify types.

    NOTE: The field order and types of the original DF will be maintained only when maintainOrder is true. As it requires access to the schema it may incur extra work.

    struct

    The fields, and types, are used to call the foldRunner. These types must match in the input fields

    debugMode

    when true the last results are taken for the replaced fields

    maintainOrder

    when true the schema is used to replace fields in the correct location, when false they are simply appended

    Definition Classes
    AddDataFunctionsImports
  31. def getBlooms(ruleSuite: RuleSuite): Seq[String]

    Identifies bloom ids before (or after) resolving for a given ruleSuite, use to know which bloom filters need to be loaded

    Identifies bloom ids before (or after) resolving for a given ruleSuite, use to know which bloom filters need to be loaded

    ruleSuite

    a ruleSuite full of expressions to check

    returns

    The bloom id's used, for unresolved expression trees this may contain blooms which are not present in the bloom map

    Definition Classes
    BloomFilterLookupImports
  32. def getConfig(name: String, default: String = ""): String

    first attempts to get the system env, then system java property then sqlconf

  33. def hashCode(): Int
    Definition Classes
    Any
  34. def identifyLookups(ruleSuite: RuleSuite): LookupResults

    Use this function to identify which maps / blooms etc.

    Use this function to identify which maps / blooms etc. are used by a given rulesuite collects all rules that are using lookup functions but without constant expressions and the list of lookups that are constants.

    Definition Classes
    LookupIdFunctionsImports
  35. def integrateLambdas(ruleSuiteMap: RuleSuiteMap, lambdas: Map[Id, Seq[LambdaFunction]], globalLibrary: Option[Id] = None): RuleSuiteMap

    Add any of the Lambdas loaded for the rule suites

    Add any of the Lambdas loaded for the rule suites

    globalLibrary

    - all lambdas with this RuleSuite Id will be added to all RuleSuites

    Definition Classes
    SerializingImports
  36. def integrateMetaRuleSets(dataFrame: DataFrame, ruleSuiteMap: RuleSuiteMap, metaRuleSetMap: Map[Id, Seq[MetaRuleSetRow]], stablePosition: (String) ⇒ Int, transform: (DataFrame) ⇒ DataFrame = identity): RuleSuiteMap

    Integrates meta rulesets into the rulesuites.

    Integrates meta rulesets into the rulesuites. Note this only works for a specific dataset, if rulesuites should be filtered for a given dataset then this must take place before calling.

    dataFrame

    the dataframe to identify columns for metarules

    ruleSuiteMap

    the ruleSuites relevant for this dataframe

    metaRuleSetMap

    the meta rulesets

    stablePosition

    this function must maintain the law that each column name within a RuleSet always generates the same position

    Definition Classes
    SerializingImports
  37. def integrateOutputExpressions(ruleSuiteMap: RuleSuiteMap, outputs: Map[Id, Seq[OutputExpressionRow]], globalLibrary: Option[Id] = None): (RuleSuiteMap, Map[Id, Set[Rule]])

    Returns an integrated ruleSuiteMap with a set of RuleSuite Id -> Rule mappings where the OutputExpression didn't exist.

    Returns an integrated ruleSuiteMap with a set of RuleSuite Id -> Rule mappings where the OutputExpression didn't exist.

    Users should check if their RuleSuite is in the "error" map.

    Definition Classes
    SerializingImports
  38. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  39. def loadBloomConfigs(loader: DataFrameLoader, viewDF: DataFrame, name: Column, token: Column, filter: Column, sql: Column, bigBloom: Column, value: Column, numberOfElements: Column, expectedFPP: Column): (Seq[BloomConfig], Set[String])

    Loads map configurations from a given DataFrame.

    Loads map configurations from a given DataFrame. Wherever token is present loader will be called and the filter optionally applied.

    returns

    A tuple of MapConfig's and the names of rows which had unexpected content (either token or sql must be present)

    Definition Classes
    BloomFilterLookupImports
  40. def loadBloomConfigs(loader: DataFrameLoader, viewDF: DataFrame, ruleSuiteIdColumn: Column, ruleSuiteVersionColumn: Column, ruleSuiteId: Id, name: Column, token: Column, filter: Column, sql: Column, bigBloom: Column, value: Column, numberOfElements: Column, expectedFPP: Column): (Seq[BloomConfig], Set[String])

    Loads map configurations from a given DataFrame for ruleSuiteId.

    Loads map configurations from a given DataFrame for ruleSuiteId. Wherever token is present loader will be called and the filter optionally applied.

    returns

    A tuple of MapConfig's and the names of rows which had unexpected content (either token or sql must be present)

    Definition Classes
    BloomFilterLookupImports
  41. def loadBlooms(configs: Seq[BloomConfig]): quality.impl.bloom.BloomExpressionLookup.BloomFilterMap

    Loads bloom maps ready to register from configuration

    Loads bloom maps ready to register from configuration

    Definition Classes
    BloomFilterLookupImports
  42. def loadMapConfigs(loader: DataFrameLoader, viewDF: DataFrame, name: Column, token: Column, filter: Column, sql: Column, key: Column, value: Column): (Seq[MapConfig], Set[String])

    Loads map configurations from a given DataFrame.

    Loads map configurations from a given DataFrame. Wherever token is present loader will be called and the filter optionally applied.

    returns

    A tuple of MapConfig's and the names of rows which had unexpected content (either token or sql must be present)

    Definition Classes
    MapLookupImports
  43. def loadMapConfigs(loader: DataFrameLoader, viewDF: DataFrame, ruleSuiteIdColumn: Column, ruleSuiteVersionColumn: Column, ruleSuiteId: Id, name: Column, token: Column, filter: Column, sql: Column, key: Column, value: Column): (Seq[MapConfig], Set[String])

    Loads map configurations from a given DataFrame for ruleSuiteId.

    Loads map configurations from a given DataFrame for ruleSuiteId. Wherever token is present loader will be called and the filter optionally applied.

    returns

    A tuple of MapConfig's and the names of rows which had unexpected content (either token or sql must be present)

    Definition Classes
    MapLookupImports
  44. def loadMaps(configs: Seq[MapConfig]): MapLookups
    Definition Classes
    MapLookupImports
  45. def loadViewConfigs(loader: DataFrameLoader, viewDF: DataFrame, name: Column, token: Column, filter: Column, sql: Column): (Seq[ViewConfig], Set[String])

    Loads view configurations from a given DataFrame.

    Loads view configurations from a given DataFrame. Wherever token is present loader will be called and the filter optionally applied.

    returns

    A tuple of ViewConfig's and the names of rows which had unexpected content (either token or sql must be present)

    Definition Classes
    ViewLoading
  46. def loadViewConfigs(loader: DataFrameLoader, viewDF: DataFrame, ruleSuiteIdColumn: Column, ruleSuiteVersionColumn: Column, ruleSuiteId: Id, name: Column, token: Column, filter: Column, sql: Column): (Seq[ViewConfig], Set[String])

    Loads view configurations from a given DataFrame for ruleSuiteId.

    Loads view configurations from a given DataFrame for ruleSuiteId. Wherever token is present loader will be called and the filter optionally applied.

    returns

    A tuple of ViewConfig's and the names of rows which had unexpected content (either token or sql must be present)

    Definition Classes
    ViewLoading
  47. def loadViews(viewConfigs: Seq[ViewConfig]): ViewLoadResults

    Attempts to load all the views present in the config.

    Attempts to load all the views present in the config. If a view is already registered in that name it will be replaced.

    returns

    the names of views which have been replaced

    Definition Classes
    ViewLoading
  48. def mapLookupsFromDFs(creators: Map[String, MapCreator], broadcastFunction: (MapData) ⇒ Broadcast[MapData] = ...): MapLookups

    Loads maps to broadcast, each individual dataframe may have different associated expressions

    Loads maps to broadcast, each individual dataframe may have different associated expressions

    creators

    a map of string id to MapCreator

    returns

    a map of id to broadcast variables needed for exact lookup and mapping checks

    Definition Classes
    MapLookupImports
  49. def namesFromSchema(schema: StructType): Set[String]

    Retrieves the names (with parent fields following the .

    Retrieves the names (with parent fields following the . notation for any nested fields) from the schema

    Definition Classes
    LookupIdFunctionsImports
  50. def optimalNumOfBits(n: Long, p: Double): Int

    Calculate optimal size according to the number of distinct values and false positive probability.

    Calculate optimal size according to the number of distinct values and false positive probability.

    n

    : The number of distinct values.

    p

    : The false positive probability.

    returns

    optimal number of bits of given n and p.

    Definition Classes
    BlockSplitBloomFilterImports
  51. def optimalNumberOfBuckets(n: Long, p: Double): Long
    Definition Classes
    BlockSplitBloomFilterImports
  52. def processCoalesceIfAttributeMissing(expression: Expression, names: Set[String]): Expression
  53. def processIfAttributeMissing(ruleSuite: RuleSuite, schema: StructType = StructType(Seq())): RuleSuite

    Processes a given RuleSuite to replace any coalesceIfMissingAttributes.

    Processes a given RuleSuite to replace any coalesceIfMissingAttributes. This may be called before validate / docs but *must* be called *before* adding the expression to a dataframe.

    schema

    The names to validate against, if empty no attempt to process coalesceIfAttributeMissing will be made

    Definition Classes
    ProcessDisableIfMissingImports
  54. def readLambdasFromDF(lambdaFunctionDF: DataFrame, lambdaFunctionName: Column, lambdaFunctionExpression: Column, lambdaFunctionId: Column, lambdaFunctionVersion: Column, lambdaFunctionRuleSuiteId: Column, lambdaFunctionRuleSuiteVersion: Column): Map[Id, Seq[LambdaFunction]]

    Loads lambda functions

    Loads lambda functions

    Definition Classes
    SerializingImports
  55. def readMetaRuleSetsFromDF(metaRuleSetDF: DataFrame, columnSelectionExpression: Column, lambdaFunctionExpression: Column, metaRuleSetId: Column, metaRuleSetVersion: Column, metaRuleSuiteId: Column, metaRuleSuiteVersion: Column): Map[Id, Seq[MetaRuleSetRow]]
    Definition Classes
    SerializingImports
  56. def readOutputExpressionsFromDF(outputExpressionDF: DataFrame, outputExpression: Column, outputExpressionId: Column, outputExpressionVersion: Column, outputExpressionRuleSuiteId: Column, outputExpressionRuleSuiteVersion: Column): Map[Id, Seq[OutputExpressionRow]]

    Loads output expressions

    Loads output expressions

    Definition Classes
    SerializingImports
  57. def readRulesFromDF(df: DataFrame, ruleSuiteId: Column, ruleSuiteVersion: Column, ruleSetId: Column, ruleSetVersion: Column, ruleId: Column, ruleVersion: Column, ruleExpr: Column, ruleEngineSalience: Column, ruleEngineId: Column, ruleEngineVersion: Column): RuleSuiteMap

    Loads a RuleSuite from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Loads a RuleSuite from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Definition Classes
    SerializingImports
  58. def readRulesFromDF(df: DataFrame, ruleSuiteId: Column, ruleSuiteVersion: Column, ruleSetId: Column, ruleSetVersion: Column, ruleId: Column, ruleVersion: Column, ruleExpr: Column): RuleSuiteMap

    Loads a RuleSuite from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Loads a RuleSuite from a dataframe with integers ruleSuiteId, ruleSuiteVersion, ruleSetId, ruleSetVersion, ruleId, ruleVersion and an expression string ruleExpr

    Definition Classes
    SerializingImports
  59. def registerBloomMapAndFunction(bloomFilterMap: Broadcast[quality.impl.bloom.BloomExpressionLookup.BloomFilterMap]): Unit

    Registers this bloom map and associates the probabilityIn sql expression against it

    Registers this bloom map and associates the probabilityIn sql expression against it

    Definition Classes
    BloomFilterRegistration
  60. def registerLambdaFunctions(functions: Seq[LambdaFunction]): Unit

    This function is for use with directly loading lambda functions to use within normal spark apps.

    This function is for use with directly loading lambda functions to use within normal spark apps.

    Please be aware loading rules via ruleRunner, addOverallResultsAndDetails or addDataQuality will also load any registered rules applicable for that ruleSuite and may override other registered rules. You should not use this directly and RuleSuites for that reason (otherwise your results will not be reproducible).

    Definition Classes
    LambdaFunctionsImports
  61. def registerMapLookupsAndFunction(mapLookups: MapLookups): Unit
    Definition Classes
    MapLookupImports
  62. def registerQualityFunctions(parseTypes: (String) ⇒ Option[DataType] = defaultParseTypes, zero: (DataType) ⇒ Option[Any] = defaultZero, add: (DataType) ⇒ Option[(Expression, Expression) ⇒ Expression] = ..., mapCompare: (DataType) ⇒ Option[(Any, Any) ⇒ Int] = ..., writer: (String) ⇒ Unit = println(_), registerFunction: (String, (Seq[Expression]) ⇒ Expression) ⇒ Unit = ...): Unit

    Must be called before using any functions like Passed, Failed or Probability(X)

    Must be called before using any functions like Passed, Failed or Probability(X)

    parseTypes

    override type parsing (e.g. DDL, defaults to defaultParseTypes / DataType.fromDDL)

    zero

    override zero creation for aggExpr (defaults to defaultZero)

    add

    override the "add" function for aggExpr types (defaults to defaultAdd(dataType))

    writer

    override the printCode and printExpr print writing function (defaults to println)

    registerFunction

    function to register the sql extensions

    Definition Classes
    RuleRunnerFunctionsImport
  63. def ruleEngineRunner(ruleSuite: RuleSuite, resultDataType: DataType, compileEvals: Boolean = true, debugMode: Boolean = false, resolveWith: Option[DataFrame] = None, variablesPerFunc: Int = 40, variableFuncGroup: Int = 20, forceRunnerEval: Boolean = false, forceTriggerEval: Boolean = true): Column

    Creates a column that runs the RuleSuite.

    Creates a column that runs the RuleSuite. This also forces registering the lambda functions used by that RuleSuite

    ruleSuite

    The ruleSuite with runOnPassProcessors

    resultDataType

    The type of the results from runOnPassProcessors - must be the same for all result types

    compileEvals

    Should the rules be compiled out to interim objects - by default true for eval usage, wholeStageCodeGen will evaluate in place unless forceTriggerEval set to false

    debugMode

    When debugMode is enabled the resultDataType is wrapped in Array of (salience, result) pairs to ease debugging

    resolveWith

    This experimental parameter can take the DataFrame these rules will be added to and pre-resolve and optimise the sql expressions, see the documentation for details on when to and not to use this.

    variablesPerFunc

    Defaulting to 40 allows, in combination with variableFuncGroup allows customisation of handling the 64k jvm method size limitation when performing WholeStageCodeGen

    variableFuncGroup

    Defaulting to 20

    forceRunnerEval

    Defaulting to false, passing true forces a simplified partially interpreted evaluation (compileEvals must be false to get fully interpreted)

    forceTriggerEval

    Defaulting to true, passing true forces each trigger expression to be compiled (compileEvals) and used in place, false instead expands the trigger in-line giving possible performance boosts based on JIT. Most testing has however shown this not to be the case hence the default, ymmv.

    returns

    A Column representing the QualityRules expression built from this ruleSuite

    Definition Classes
    RuleEngineRunnerImports
  64. def ruleEngineWithStruct(dataFrame: DataFrame, rules: RuleSuite, outputType: DataType, ruleEngineFieldName: String = "ruleEngine", alias: String = "main", debugMode: Boolean = false, compileEvals: Boolean = false, forceRunnerEval: Boolean = false, forceTriggerEval: Boolean = false): DataFrame

    Leverages the ruleRunner to produce an new output structure, the outputType defines the output structure generated.

    Leverages the ruleRunner to produce an new output structure, the outputType defines the output structure generated.

    This version should only be used when you require select(*, ruleRunner) to be used, it requires you fully specify types.

    dataFrame

    the input dataframe

    outputType

    The fields, and types, are used to call the foldRunner. These types must match in the input fields

    ruleEngineFieldName

    The field name the results will be stored in, by default ruleEngine

    alias

    sets the alias to use for dataFrame when using subqueries to resolve ambiguities, setting to an empty string (or null) will not assign an alias

    Definition Classes
    AddDataFunctionsImports
  65. def ruleEngineWithStructF[P[R] >: DatasetBase[R]](rules: RuleSuite, outputType: DataType, ruleEngineFieldName: String = "ruleEngine", alias: String = "main", debugMode: Boolean = false, compileEvals: Boolean = false, forceRunnerEval: Boolean = false, forceTriggerEval: Boolean = false): (P[Row]) ⇒ P[Row]

    Leverages the ruleRunner to produce an new output structure, the outputType defines the output structure generated.

    Leverages the ruleRunner to produce an new output structure, the outputType defines the output structure generated.

    This version should only be used when you require select(*, ruleRunner) to be used, it requires you fully specify types.

    outputType

    The fields, and types, are used to call the foldRunner. These types must match in the input fields

    Definition Classes
    AddDataFunctionsImports
  66. def ruleFolderRunner(ruleSuite: RuleSuite, startingStruct: Column, compileEvals: Boolean = false, debugMode: Boolean = false, resolveWith: Option[DataFrame] = None, variablesPerFunc: Int = 40, variableFuncGroup: Int = 20, forceRunnerEval: Boolean = false, useType: Option[StructType] = None, forceTriggerEval: Boolean = false): Column

    Creates a column that runs the folding RuleSuite.

    Creates a column that runs the folding RuleSuite. This also forces registering the lambda functions used by that RuleSuite.

    FolderRunner runs all output expressions for matching rules in order of salience, the startingStruct is passed ot the first matching, the result passed to the second etc. In contrast to ruleEngineRunner OutputExpressions should be lambdas with one parameter, that of the structure

    ruleSuite

    The ruleSuite with runOnPassProcessors

    startingStruct

    This struct is passed to the first matching rule, ideally you would use the spark dsl struct function to refer to existing columns

    compileEvals

    Should the rules be compiled out to interim objects - by default false, allowing optimisations

    debugMode

    When debugMode is enabled the resultDataType is wrapped in Array of (salience, result) pairs to ease debugging

    resolveWith

    This experimental parameter can take the DataFrame these rules will be added to and pre-resolve and optimise the sql expressions, see the documentation for details on when to and not to use this.

    variablesPerFunc

    Defaulting to 40 allows, in combination with variableFuncGroup allows customisation of handling the 64k jvm method size limitation when performing WholeStageCodeGen

    variableFuncGroup

    Defaulting to 20

    forceRunnerEval

    Defaulting to false, passing true forces a simplified partially interpreted evaluation (compileEvals must be false to get fully interpreted)

    useType

    In the case you must use select and can't use withColumn you may provide a type directly to stop the NPE

    forceTriggerEval

    Defaulting to false, passing true forces each trigger expression to be compiled (compileEvals) and used in place, false instead expands the trigger in-line giving possible performance boosts based on JIT

    returns

    A Column representing the QualityRules expression built from this ruleSuite

    Definition Classes
    RuleFolderRunnerImports
  67. def ruleRunner(ruleSuite: RuleSuite, compileEvals: Boolean = true, resolveWith: Option[DataFrame] = None, variablesPerFunc: Int = 40, variableFuncGroup: Int = 20, forceRunnerEval: Boolean = false): Column

    Creates a column that runs the RuleSuite.

    Creates a column that runs the RuleSuite. This also forces registering the lambda functions used by that RuleSuite

    ruleSuite

    The Qualty RuleSuite to evaluate

    compileEvals

    Should the rules be compiled out to interim objects - by default true for eval usage, wholeStageCodeGen will evaluate in place

    resolveWith

    This experimental parameter can take the DataFrame these rules will be added to and pre-resolve and optimise the sql expressions, see the documentation for details on when to and not to use this. RuleRunner does not currently do wholestagecodegen when resolveWith is used.

    variablesPerFunc

    Defaulting to 40, it allows, in combination with variableFuncGroup customisation of handling the 64k jvm method size limitation when performing WholeStageCodeGen. You _shouldn't_ need it but it's there just in case.

    variableFuncGroup

    Defaulting to 20

    forceRunnerEval

    Defaulting to false, passing true forces a simplified partially interpreted evaluation (compileEvals must be false to get fully interpreted)

    returns

    A Column representing the Quality DQ expression built from this ruleSuite

    Definition Classes
    RuleRunnerImports
  68. def toDS(ruleSuite: RuleSuite): Dataset[RuleRow]

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    returns

    a Dataset[RowRaw] with the rules flattened out

    Definition Classes
    SerializingImports
  69. def toLambdaDS(ruleSuite: RuleSuite): Dataset[LambdaFunctionRow]

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    returns

    a Dataset[RowRaw] with the rules flattened out

    Definition Classes
    SerializingImports
  70. def toOutputExpressionDS(ruleSuite: RuleSuite): Dataset[OutputExpressionRow]

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    Must have an active sparksession before calling and only works with ExpressionRule's, all other rules are converted to 1=1

    returns

    a Dataset[RowRaw] with the rules flattened out

    Definition Classes
    SerializingImports
  71. def toRuleSuiteDF(ruleSuite: RuleSuite): DataFrame

    Utility function to easy dealing with simple DQ rules where the rule engine functionality is ignored.

    Utility function to easy dealing with simple DQ rules where the rule engine functionality is ignored. This can be paired with the default ruleEngine parameter in readRulesFromDF

    Definition Classes
    SerializingImports
  72. def toString(): String
    Definition Classes
    Any
  73. def typedExpressionRunner(ruleSuite: RuleSuite, ddlType: String, name: String = "expressionResults", forceRunnerEval: Boolean = false): Column

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: String, resultDDL: String)

    Runs the ruleSuite expressions saving results as a tuple of (ruleResult: String, resultDDL: String)

    Definition Classes
    ExpressionRunnerImports
  74. def validate(schemaOrFrame: Either[StructType, DataFrame], ruleSuite: RuleSuite, showParams: ShowParams = ShowParams(), runnerFunction: Option[(DataFrame) ⇒ Column] = None, qualityName: String = "Quality", recursiveLambdasSOEIsOk: Boolean = false, transformBeforeShow: (DataFrame) ⇒ DataFrame = identity, viewLookup: (String) ⇒ Boolean = Validation.defaultViewLookup): (Set[RuleError], Set[RuleWarning], String, RuleSuiteDocs, Map[IdTrEither, ExpressionLookup])

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    schemaOrFrame

    when it's a Left( StructType ) the struct will be used to test against and an emptyDataframe of this type created to resolve on the spark level. Using Right(DataFrame) will cause that dataframe to be used which is great for test cases with a runnerFunction

    showParams

    - configure how the output text is formatted using the same options and formatting as dataFrame.show

    runnerFunction

    - allows you to create a ruleRunner or ruleEngineRunner with different configurations

    qualityName

    - the column name to store the runnerFunction results in

    recursiveLambdasSOEIsOk

    - this signals that finding a recursive lambda SOE should not stop the evaluations - if true it will still try to run any runnerFunction but may not give the correct results

    transformBeforeShow

    - an optional transformation function to help shape what results are pushed to show

    viewLookup

    - for any subquery used looks up the view name for being present (quoted and with schema), defaults to the current spark catalogue

    returns

    A set of errors and the output from the dataframe when a runnerFunction is specified

    Definition Classes
    ValidationImports
  75. def validate(frame: DataFrame, ruleSuite: RuleSuite, runnerFunction: (DataFrame) ⇒ Column, transformBeforeShow: (DataFrame) ⇒ DataFrame): (Set[RuleError], Set[RuleWarning], String, RuleSuiteDocs, Map[IdTrEither, ExpressionLookup])

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    frame

    when it's a Left( StructType ) the struct will be used to test against and an emptyDataframe of this type created to resolve on the spark level. Using Right(DataFrame) will cause that dataframe to be used which is great for test cases with a runnerFunction

    runnerFunction

    - allows you to create a ruleRunner or ruleEngineRunner with different configurations

    transformBeforeShow

    - an optional transformation function to help shape what results are pushed to show

    returns

    A set of errors and the output from the dataframe when a runnerFunction is specified

    Definition Classes
    ValidationImports
  76. def validate(frame: DataFrame, ruleSuite: RuleSuite, runnerFunction: (DataFrame) ⇒ Column, transformBeforeShow: (DataFrame) ⇒ DataFrame, viewLookup: (String) ⇒ Boolean): (Set[RuleError], Set[RuleWarning], String, RuleSuiteDocs, Map[IdTrEither, ExpressionLookup])

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    frame

    when it's a Left( StructType ) the struct will be used to test against and an emptyDataframe of this type created to resolve on the spark level. Using Right(DataFrame) will cause that dataframe to be used which is great for test cases with a runnerFunction

    runnerFunction

    - allows you to create a ruleRunner or ruleEngineRunner with different configurations

    transformBeforeShow

    - an optional transformation function to help shape what results are pushed to show

    returns

    A set of errors and the output from the dataframe when a runnerFunction is specified

    Definition Classes
    ValidationImports
  77. def validate(frame: DataFrame, ruleSuite: RuleSuite, runnerFunction: (DataFrame) ⇒ Column): (Set[RuleError], Set[RuleWarning], String, RuleSuiteDocs, Map[IdTrEither, ExpressionLookup])

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    frame

    when it's a Left( StructType ) the struct will be used to test against and an emptyDataframe of this type created to resolve on the spark level. Using Right(DataFrame) will cause that dataframe to be used which is great for test cases with a runnerFunction

    runnerFunction

    - allows you to create a ruleRunner or ruleEngineRunner with different configurations

    returns

    A set of errors and the output from the dataframe when a runnerFunction is specified

    Definition Classes
    ValidationImports
  78. def validate(frame: DataFrame, ruleSuite: RuleSuite): (Set[RuleError], Set[RuleWarning])

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    frame

    when it's a Left( StructType ) the struct will be used to test against and an emptyDataframe of this type created to resolve on the spark level. Using Right(DataFrame) will cause that dataframe to be used which is great for test cases with a runnerFunction

    returns

    A set of errors and the output from the dataframe when a runnerFunction is specified

    Definition Classes
    ValidationImports
  79. def validate(schema: StructType, ruleSuite: RuleSuite): (Set[RuleError], Set[RuleWarning])

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    schema

    which fields should the dataframe have

    returns

    A set of errors and the output from the dataframe when a runnerFunction is specified

    Definition Classes
    ValidationImports
  80. def validate_Lookup(frame: DataFrame, ruleSuite: RuleSuite, runnerFunction: (DataFrame) ⇒ Column, viewLookup: (String) ⇒ Boolean): (Set[RuleError], Set[RuleWarning], String, RuleSuiteDocs, Map[IdTrEither, ExpressionLookup])

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    frame

    when it's a Left( StructType ) the struct will be used to test against and an emptyDataframe of this type created to resolve on the spark level. Using Right(DataFrame) will cause that dataframe to be used which is great for test cases with a runnerFunction

    runnerFunction

    - allows you to create a ruleRunner or ruleEngineRunner with different configurations

    returns

    A set of errors and the output from the dataframe when a runnerFunction is specified

    Definition Classes
    ValidationImports
  81. def validate_Lookup(frame: DataFrame, ruleSuite: RuleSuite, viewLookup: (String) ⇒ Boolean = Validation.defaultViewLookup): (Set[RuleError], Set[RuleWarning])

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    frame

    when it's a Left( StructType ) the struct will be used to test against and an emptyDataframe of this type created to resolve on the spark level. Using Right(DataFrame) will cause that dataframe to be used which is great for test cases with a runnerFunction

    returns

    A set of errors and the output from the dataframe when a runnerFunction is specified

    Definition Classes
    ValidationImports
  82. def validate_Lookup(schema: StructType, ruleSuite: RuleSuite, viewLookup: (String) ⇒ Boolean): (Set[RuleError], Set[RuleWarning])

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    For a given dataFrame provide a full set of any validation errors for a given ruleSuite.

    schema

    which fields should the dataframe have

    returns

    A set of errors and the output from the dataframe when a runnerFunction is specified

    Definition Classes
    ValidationImports
  83. object BloomModel extends Serializable
  84. object DisabledRule extends RuleResult with Product with Serializable

    This shouldn't evaluate to a fail, allows signalling a rule has been disabled

  85. object Failed extends RuleResult with Product with Serializable
  86. object LambdaFunction
  87. object Passed extends RuleResult with Product with Serializable
  88. object QualityException extends Serializable
  89. object RuleSuiteResultDetails extends Serializable
  90. object RunOnPassProcessor
  91. object SoftFailed extends RuleResult with Product with Serializable

    This shouldn't evaluate to a fail, think of it as Amber / Warn

Inherited from ExpressionRunnerImports

Inherited from ViewLoading

Inherited from RuleFolderRunnerImports

Inherited from ValidationImports

Inherited from RuleEngineRunnerImports

Inherited from LambdaFunctionsImports

Inherited from AddDataFunctionsImports

Inherited from SerializingImports

Inherited from BloomFilterLookupImports

Inherited from LookupIdFunctionsImports

Inherited from MapLookupImports

Inherited from Serializable

Inherited from Serializable

Inherited from RuleRunnerImports

Inherited from BloomFilterRegistration

Inherited from RuleRunnerFunctionsImport

Inherited from BloomFilterTypes

Inherited from AnyRef

Inherited from Any

Ungrouped