package qualityFunctions
- Alphabetic
- Public
- All
Type Members
-
class
DoCodegenFallbackHandler extends LambdaCompilationHandler
Defaults to calling codeGen, this can either be an original compilation approach or the CodegenFallback depending on implementation.
Defaults to calling codeGen, this can either be an original compilation approach or the CodegenFallback depending on implementation. It will evaluate the entire tree of expr and return all NamedLambdaVariables as they will not be using the same compilation approach.
This is the default for known OSS implementations and should also be used if compilation will not be within the same class
-
trait
FunDoGenCode extends Expression with CodegenFallback
Generate code for any FunX including nested, normal doGenCode defaults to codegenfallback
-
case class
FunForward(children: Seq[Expression]) extends Expression with CodegenFallback with Product with Serializable
Forwards calls to the function arguments via setters.
Forwards calls to the function arguments via setters. This is only evaluated in aggExpr, all other usages are removed during lambda creation.
This removal may also be forced in aggExpr at a later stage
-
case class
FunN(arguments: Seq[Expression], function: Expression, name: Option[String] = None, processed: Boolean = false, attemptCodeGen: Boolean = false) extends Expression with HigherOrderFunctionLike with CodegenFallback with SeqArgs with FunDoGenCode with Product with Serializable
Lambda function with multiple args, typically created with a placeholder AtomicRefExpression args
Lambda function with multiple args, typically created with a placeholder AtomicRefExpression args
- arguments
Evaluated to provide input to the function lambda
- function
the actual lambda function
- name
the lambda name when available
-
case class
MapMerge(children: Seq[Expression], addF: (DataType) ⇒ Option[(Expression, Expression) ⇒ Expression]) extends Expression with CodegenFallback with Product with Serializable
Transforms a map
Transforms a map
- children
seq of maps of type x to y, they must all have the same types
- addF
function to derive the add expr for monoidal add on values
-
case class
MapTransform(argument: Expression, key: Expression, function: Expression, zeroF: (DataType) ⇒ Option[Any]) extends Expression with HigherOrderFunctionLike with CodegenFallback with SeqArgs with Product with Serializable
Transforms a map
Transforms a map
- argument
map of type x to y,
- key
expr for key
- function
value to value transformation for that key entry
-
case class
NamedLambdaVariableCodeGen(name: String, dataType: DataType, nullable: Boolean, exprId: ExprId, valueRef: String) extends LeafExpression with NamedExpression with Product with Serializable
Replaces NamedLambdaVariables for simple inlined codegen.
-
case class
PlaceHolderExpression(dataType: DataType, nullable: Boolean = true) extends LeafExpression with Unevaluable with Product with Serializable
Only used with Lambda placeholders, defaults to allowing nullable values
- trait RefCodeGen extends AnyRef
-
case class
RefExpression(dataType: DataType, nullable: Boolean = true, index: Int = -1) extends LeafExpression with RefCodeGen with Product with Serializable
Getter, trimmed version of NamedLambdaVariable as it should never be resolved
-
case class
RefExpressionLazyType(dataTypeF: () ⇒ DataType, nullable: Boolean) extends LeafExpression with RefCodeGen with Product with Serializable
Getter, trimmed version of NamedLambdaVariable as it should never be resolved
-
case class
RefSetterExpression(children: Seq[Expression]) extends Expression with CodegenFallback with Product with Serializable
Wraps other expressions and stores the result in an RefExpression -
-
case class
RunAllReturnLast(children: Seq[Expression]) extends Expression with CodegenFallback with Product with Serializable
Runs all of the children and returns the last's eval result - allows stitching together lambdas with aggregates
- trait SeqArgs extends AnyRef
Value Members
- object FunCall
-
object
LambdaCompilationUtils
Functionality related to LambdaCompilation.
Functionality related to LambdaCompilation. Seemingly all HigherOrderFunctions use a lazy val match to extract the NamedLambdaVariable's from the spark LambdaFunction after bind has been called. When doGenCode is called eval _could_ have been called and the lazy val evaluated, as such simply rewriting the tree may not fully work. Additionally the type for NamedLambdaVariable is bound in the lazy val's which means _ANY_ HigherOrderFunction may not tolerate swapping out NamedLambdaVariables for another NamedExpression.
To add to the fun OpenSource Spark HoF's all use CodegenFallback, as does NamedLambdaVariable, so it's possible to swap out some of these implementations if an array_transform is nested in a Fun1 or Fun2. Similarly Fun1's can call Fun2 so the assumptions are for each Fun1/FunN doCodeGen:
1. Use the processLambda function to evaluate the function 2. compilationHandlers uses the quality.lambdaHandlers environment variable to load a comma separated list of fqn=handler pairs 3. each fully qualified class name pair (e.g. org.apache.spark.sql.catalyst.expressions.ZipWith=handler.fqn) handler is loaded 4. processLambda then evaluates the expression tree, for each matching HoF classname it will call the handler 5. handlers are used to perform the custom doGenCode for that expression rather than the default OSS CodegenFallback 6. handlers return the ExprCode AND a list of NamedLambdaVariables who must have .value.set called upon them (e.g. we can't optimise them)
NB The fqn will also be used to check for named' lambdas used through registerLambdaFunctions.
https://github.com/apache/spark/pull/21954 introduced the lambdavariable with AtomicReference, it's inherent performance hit and, due to the difficulty of threading the holder through the expression chain did not have a compilation approach. After it's threaded and bind has been called the variable id is stable as is the AtomicReference, as such it can be swapped out for a simple variable in the same object.
quality.lambdaHandlers will override the default for a given platform on an fqn basis, so you only need to "add" or "replace" the HoFs that cause issue not the entire list of OSS HigherOrderFunctions for example TransformValues. Note that some versions of Databricks provide compilation of their HoF's that may not be compatible in approach.
Disable this approach by using the quality.lambdaHandlers to disable FunN with the default DoCodegenFallbackHandler: quality.lambdaHandlers=org.apache.spark.sql.qualityFunctions.FunN=org.apache.spark.sql.qualityFunctions.DoCodegenFallbackHandler
- object LambdaFunctions
- object MapTransform extends Serializable
- object SeqArgs
- object SubQueryLambda
- object utils