package qualityFunctions
- Alphabetic
- Public
- All
Type Members
-
trait
Digest extends AnyRef
Basic digest implementation for Array[Long] based hashes
-
trait
DigestFactory extends Serializable
Factory to get a new or reset digest for each row
-
class
DoCodegenFallbackHandler extends LambdaCompilationHandler
Defaults to calling codeGen, this can either be an original compilation approach or the CodegenFallback depending on implementation.
Defaults to calling codeGen, this can either be an original compilation approach or the CodegenFallback depending on implementation. It will evaluate the entire tree of expr and return all NamedLambdaVariables as they will not be using the same compilation approach.
This is the default for known OSS implementations and should also be used if compilation will not be within the same class
-
trait
FunDoGenCode extends Expression with CodegenFallback
Generate code for any FunX including nested, normal doGenCode defaults to codegenfallback
-
case class
FunForward(children: Seq[Expression]) extends Expression with CodegenFallback with Product with Serializable
Forwards calls to the function arguments via setters.
Forwards calls to the function arguments via setters. This is only evaluated in aggExpr, all other usages are removed during lambda creation.
This removal may also be forced in aggExpr at a later stage
-
case class
FunN(arguments: Seq[Expression], function: Expression, name: Option[String] = None, processed: Boolean = false, attemptCodeGen: Boolean = false) extends Expression with HigherOrderFunction with CodegenFallback with SeqArgs with FunDoGenCode with Product with Serializable
Lambda function with multiple args, typically created with a placeholder AtomicRefExpression args
Lambda function with multiple args, typically created with a placeholder AtomicRefExpression args
- arguments
Evaluated to provide input to the function lambda
- function
the actual lambda function
- name
the lambda name when available
-
abstract
class
HashLongsExpression extends Expression with CodegenFallback
A function that calculates hash value for a group of expressions.
A function that calculates hash value for a group of expressions. Note that the
seed
argument is not exposed to users and should only be set inside spark SQL.The hash value for an expression depends on its type and seed:
- null: seed
- boolean: turn boolean into int, 1 for true, 0 for false, and then use murmur3 to hash this int with seed.
- byte, short, int: use murmur3 to hash the input as int with seed.
- long: use murmur3 to hash the long input with seed.
- float: turn it into int: java.lang.Float.floatToIntBits(input), and hash it.
- double: turn it into long: java.lang.Double.doubleToLongBits(input), and hash it.
- decimal: if it's a small decimal, i.e. precision <= 18, turn it into long and hash it. Else, turn it into bytes and hash it.
- calendar interval: hash
microseconds
first, and use the result as seed to hashmonths
. - interval day to second: it store long value of
microseconds
, use murmur3 to hash the long input with seed. - interval year to month: it store int value of
months
, use murmur3 to hash the int input with seed. - binary: use murmur3 to hash the bytes with seed.
- string: get the bytes of string and hash it.
- array: The
result
starts with seed, then useresult
as seed, recursively calculate hash value for each element, and assign the element hash value toresult
. - struct: The
result
starts with seed, then useresult
as seed, recursively calculate hash value for each field, and assign the field hash value toresult
.
Finally we aggregate the hash values for each expression by the same way of struct.
-
abstract
class
InterpretedHashLongsFunction extends AnyRef
Base class for interpreted hash functions.
-
case class
MapMerge(children: Seq[Expression], addF: (DataType) ⇒ Option[(Expression, Expression) ⇒ Expression]) extends Expression with CodegenFallback with Product with Serializable
Transforms a map
Transforms a map
- children
seq of maps of type x to y, they must all have the same types
- addF
function to derive the add expr for monoidal add on values
-
case class
MapTransform(argument: Expression, key: Expression, function: Expression, zeroF: (DataType) ⇒ Option[Any]) extends Expression with HigherOrderFunction with CodegenFallback with SeqArgs with Product with Serializable
Transforms a map
Transforms a map
- argument
map of type x to y,
- key
expr for key
- function
value to value transformation for that key entry
-
case class
NamedLambdaVariableCodeGen(name: String, dataType: DataType, nullable: Boolean, exprId: ExprId, valueRef: String) extends LeafExpression with NamedExpression with Product with Serializable
Replaces NamedLambdaVariables for simple inlined codegen.
-
case class
PlaceHolderExpression(dataType: DataType, nullable: Boolean = true) extends LeafExpression with Unevaluable with Product with Serializable
Only used with Lambda placeholders, defaults to allowing nullable values
- trait RefCodeGen extends AnyRef
-
case class
RefExpression(dataType: DataType, nullable: Boolean = true, index: Int = -1) extends LeafExpression with RefCodeGen with Product with Serializable
Getter, trimmed version of NamedLambdaVariable as it should never be resolved
-
case class
RefExpressionLazyType(dataTypeF: () ⇒ DataType, nullable: Boolean) extends LeafExpression with RefCodeGen with Product with Serializable
Getter, trimmed version of NamedLambdaVariable as it should never be resolved
-
case class
RefSetterExpression(children: Seq[Expression]) extends Expression with CodegenFallback with Product with Serializable
Wraps other expressions and stores the result in an RefExpression -
-
case class
RunAllReturnLast(children: Seq[Expression]) extends Expression with CodegenFallback with Product with Serializable
Runs all of the children and returns the last's eval result - allows stitching together lambdas with aggregates
- trait SeqArgs extends AnyRef
Value Members
- object FunCall
-
object
LambdaCompilationUtils
Functionality related to LambdaCompilation.
Functionality related to LambdaCompilation. Seemingly all HigherOrderFunctions use a lazy val match to extract the NamedLambdaVariable's from the spark LambdaFunction after bind has been called. When doGenCode is called eval _could_ have been called and the lazy val evaluated, as such simply rewriting the tree may not fully work. Additionally the type for NamedLambdaVariable is bound in the lazy val's which means _ANY_ HigherOrderFunction may not tolerate swapping out NamedLambdaVariables for another NamedExpression.
To add to the fun OpenSource Spark HoF's all use CodegenFallback, as does NamedLambdaVariable, so it's possible to swap out some of these implementations if an array_transform is nested in a Fun1 or Fun2. Similarly Fun1's can call Fun2 so the assumptions are for each Fun1/FunN doCodeGen:
1. Use the processLambda function to evaluate the function 2. compilationHandlers uses the quality.lambdaHandlers environment variable to load a comma separated list of fqn=handler pairs 3. each fully qualified class name pair (e.g. org.apache.spark.sql.catalyst.expressions.ZipWith=handler.fqn) handler is loaded 4. processLambda then evaluates the expression tree, for each matching HoF classname it will call the handler 5. handlers are used to perform the custom doGenCode for that expression rather than the default OSS CodegenFallback 6. handlers return the ExprCode AND a list of NamedLambdaVariables who must have .value.set called upon them (e.g. we can't optimise them)
NB The fqn will also be used to check for named' lambdas used through registerLambdaFunctions.
https://github.com/apache/spark/pull/21954 introduced the lambdavariable with AtomicReference, it's inherent performance hit and, due to the difficulty of threading the holder through the expression chain did not have a compilation approach. After it's threaded and bind has been called the variable id is stable as is the AtomicReference, as such it can be swapped out for a simple variable in the same object.
quality.lambdaHandlers will override the default for a given platform on an fqn basis, so you only need to "add" or "replace" the HoFs that cause issue not the entire list of OSS HigherOrderFunctions for example TransformValues. Note that some versions of Databricks provide compilation of their HoF's that may not be compatible in approach.
Disable this approach by using the quality.lambdaHandlers to disable FunN with the default DoCodegenFallbackHandler: quality.lambdaHandlers=org.apache.spark.sql.qualityFunctions.FunN=org.apache.spark.sql.qualityFunctions.DoCodegenFallbackHandler
- object LambdaFunctions
- object MapTransform extends Serializable
- object SafeUTF8
- object SeqArgs