p

org.apache.spark.sql

qualityFunctions

package qualityFunctions

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class DoCodegenFallbackHandler extends LambdaCompilationHandler

    Defaults to calling codeGen, this can either be an original compilation approach or the CodegenFallback depending on implementation.

    Defaults to calling codeGen, this can either be an original compilation approach or the CodegenFallback depending on implementation. It will evaluate the entire tree of expr and return all NamedLambdaVariables as they will not be using the same compilation approach.

    This is the default for known OSS implementations and should also be used if compilation will not be within the same class

  2. trait FunDoGenCode extends Expression with CodegenFallback

    Generate code for any FunX including nested, normal doGenCode defaults to codegenfallback

  3. case class FunForward(children: Seq[Expression]) extends Expression with CodegenFallback with Product with Serializable

    Forwards calls to the function arguments via setters.

    Forwards calls to the function arguments via setters. This is only evaluated in aggExpr, all other usages are removed during lambda creation.

    This removal may also be forced in aggExpr at a later stage

  4. case class FunN(arguments: Seq[Expression], function: Expression, name: Option[String] = None, processed: Boolean = false, attemptCodeGen: Boolean = false) extends Expression with HigherOrderFunctionLike with CodegenFallback with SeqArgs with FunDoGenCode with Product with Serializable

    Lambda function with multiple args, typically created with a placeholder AtomicRefExpression args

    Lambda function with multiple args, typically created with a placeholder AtomicRefExpression args

    arguments

    Evaluated to provide input to the function lambda

    function

    the actual lambda function

    name

    the lambda name when available

  5. case class MapMerge(children: Seq[Expression], addF: (DataType) ⇒ Option[(Expression, Expression) ⇒ Expression]) extends Expression with CodegenFallback with Product with Serializable

    Transforms a map

    Transforms a map

    children

    seq of maps of type x to y, they must all have the same types

    addF

    function to derive the add expr for monoidal add on values

  6. case class MapTransform(argument: Expression, key: Expression, function: Expression, zeroF: (DataType) ⇒ Option[Any]) extends Expression with HigherOrderFunctionLike with CodegenFallback with SeqArgs with Product with Serializable

    Transforms a map

    Transforms a map

    argument

    map of type x to y,

    key

    expr for key

    function

    value to value transformation for that key entry

  7. case class NamedLambdaVariableCodeGen(name: String, dataType: DataType, nullable: Boolean, exprId: ExprId, valueRef: String) extends LeafExpression with NamedExpression with Product with Serializable

    Replaces NamedLambdaVariables for simple inlined codegen.

  8. case class PlaceHolderExpression(dataType: DataType, nullable: Boolean = true) extends LeafExpression with Unevaluable with Product with Serializable

    Only used with Lambda placeholders, defaults to allowing nullable values

  9. trait RefCodeGen extends AnyRef
  10. case class RefExpression(dataType: DataType, nullable: Boolean = true, index: Int = -1) extends LeafExpression with RefCodeGen with Product with Serializable

    Getter, trimmed version of NamedLambdaVariable as it should never be resolved

  11. case class RefExpressionLazyType(dataTypeF: () ⇒ DataType, nullable: Boolean) extends LeafExpression with RefCodeGen with Product with Serializable

    Getter, trimmed version of NamedLambdaVariable as it should never be resolved

  12. case class RefSetterExpression(children: Seq[Expression]) extends Expression with CodegenFallback with Product with Serializable

    Wraps other expressions and stores the result in an RefExpression -

  13. case class RunAllReturnLast(children: Seq[Expression]) extends Expression with CodegenFallback with Product with Serializable

    Runs all of the children and returns the last's eval result - allows stitching together lambdas with aggregates

  14. trait SeqArgs extends AnyRef

Value Members

  1. object FunCall
  2. object LambdaCompilationUtils

    Functionality related to LambdaCompilation.

    Functionality related to LambdaCompilation. Seemingly all HigherOrderFunctions use a lazy val match to extract the NamedLambdaVariable's from the spark LambdaFunction after bind has been called. When doGenCode is called eval _could_ have been called and the lazy val evaluated, as such simply rewriting the tree may not fully work. Additionally the type for NamedLambdaVariable is bound in the lazy val's which means _ANY_ HigherOrderFunction may not tolerate swapping out NamedLambdaVariables for another NamedExpression.

    To add to the fun OpenSource Spark HoF's all use CodegenFallback, as does NamedLambdaVariable, so it's possible to swap out some of these implementations if an array_transform is nested in a Fun1 or Fun2. Similarly Fun1's can call Fun2 so the assumptions are for each Fun1/FunN doCodeGen:

    1. Use the processLambda function to evaluate the function 2. compilationHandlers uses the quality.lambdaHandlers environment variable to load a comma separated list of fqn=handler pairs 3. each fully qualified class name pair (e.g. org.apache.spark.sql.catalyst.expressions.ZipWith=handler.fqn) handler is loaded 4. processLambda then evaluates the expression tree, for each matching HoF classname it will call the handler 5. handlers are used to perform the custom doGenCode for that expression rather than the default OSS CodegenFallback 6. handlers return the ExprCode AND a list of NamedLambdaVariables who must have .value.set called upon them (e.g. we can't optimise them)

    NB The fqn will also be used to check for named' lambdas used through registerLambdaFunctions.

    https://github.com/apache/spark/pull/21954 introduced the lambdavariable with AtomicReference, it's inherent performance hit and, due to the difficulty of threading the holder through the expression chain did not have a compilation approach. After it's threaded and bind has been called the variable id is stable as is the AtomicReference, as such it can be swapped out for a simple variable in the same object.

    quality.lambdaHandlers will override the default for a given platform on an fqn basis, so you only need to "add" or "replace" the HoFs that cause issue not the entire list of OSS HigherOrderFunctions for example TransformValues. Note that some versions of Databricks provide compilation of their HoF's that may not be compatible in approach.

    Disable this approach by using the quality.lambdaHandlers to disable FunN with the default DoCodegenFallbackHandler: quality.lambdaHandlers=org.apache.spark.sql.qualityFunctions.FunN=org.apache.spark.sql.qualityFunctions.DoCodegenFallbackHandler

  3. object LambdaFunctions
  4. object MapTransform extends Serializable
  5. object SeqArgs
  6. object SubQueryLambda
  7. object utils

Ungrouped