LambdaCompilationUtils

object LambdaCompilationUtils

Functionality related to LambdaCompilation. Seemingly all HigherOrderFunctions use a lazy val match to extract the NamedLambdaVariable's from the spark LambdaFunction after bind has been called. When doGenCode is called eval _could_ have been called and the lazy val evaluated, as such simply rewriting the tree may not fully work. Additionally the type for NamedLambdaVariable is bound in the lazy val's which means _ANY_ HigherOrderFunction may not tolerate swapping out NamedLambdaVariables for another NamedExpression.

To add to the fun OpenSource Spark HoF's all use CodegenFallback, as does NamedLambdaVariable, so it's possible to swap out some of these implementations if an array_transform is nested in a Fun1 or Fun2. Similarly Fun1's can call Fun2 so the assumptions are for each Fun1/FunN doCodeGen:

1. Use the processLambda function to evaluate the function 2. compilationHandlers uses the quality.lambdaHandlers environment variable to load a comma separated list of fqn=handler pairs 3. each fully qualified class name pair (e.g. org.apache.spark.sql.catalyst.expressions.ZipWith=handler.fqn) handler is loaded 4. processLambda then evaluates the expression tree, for each matching HoF classname it will call the handler 5. handlers are used to perform the custom doGenCode for that expression rather than the default OSS CodegenFallback 6. handlers return the ExprCode AND a list of NamedLambdaVariables who must have .value.set called upon them (e.g. we can't optimise them)

NB The fqn will also be used to check for named' lambdas used through registerLambdaFunctions.

https://github.com/apache/spark/pull/21954 introduced the lambdavariable with AtomicReference, it's inherent performance hit and, due to the difficulty of threading the holder through the expression chain did not have a compilation approach. After it's threaded and bind has been called the variable id is stable as is the AtomicReference, as such it can be swapped out for a simple variable in the same object.

quality.lambdaHandlers will override the default for a given platform on an fqn basis, so you only need to "add" or "replace" the HoFs that cause issue not the entire list of OSS HigherOrderFunctions for example TransformValues. Note that some versions of Databricks provide compilation of their HoF's that may not be compatible in approach.

Disable this approach by using the quality.lambdaHandlers to disable FunN with the default DoCodegenFallbackHandler: quality.lambdaHandlers=org.apache.spark.sql.qualityFunctions.FunN=org.apache.spark.sql.qualityFunctions.DoCodegenFallbackHandler

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

LambdaCompilationUtils
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Type Members

trait LambdaCompilationHandler extends AnyRef
implicit class NamedLambdaVariableOps extends AnyRef

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def compilationHandlers: Map[String, LambdaCompilationHandler]
def convertToCompilationHandlers(objects: Map[String, Any] = loadLambdaCompilationHandlers()): Map[String, LambdaCompilationHandler]
val defaultGen: DoCodegenFallbackHandler
def envLambdaHandlers(env: String = getLambdaEnv): Map[String, String]
Parses lambda handlers from the quality.lambdaHandlers environment variable as a comma separated lambda|fqn=fqn pair.
Parses lambda handlers from the quality.lambdaHandlers environment variable as a comma separated lambda|fqn=fqn pair. The key can be either a lambda function name used in registerLambdaFunctions or a fully qualified class name referring to a HigherOrderFunction. The value must be a fully qualified class name for a LambdaCompilationHandler (it is suggested you use a top level class). If you wish to disable compilation for a given lambda or HigherOrderFunction use org.apache.spark.sql.qualityFunctions.DoCodegenFallbackHandler You may use your own to re-write compilation for HoF classes that do not provide it but are hotspots.
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def getLambdaEnv: String
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
val lambdaENV: String
def loadLambdaCompilationHandlers(lambdaHandlers: Map[String, String] = envLambdaHandlers()): Map[String, Any]
Combines the defaultLambdaHandlers for a spark version with environment specific overrides
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def processLambda(expr: Expression, ctx: CodegenContext, ev: ExprCode): Expression
Replaces all NamedLambdaVariables for expressions which are not in compilationHandlers with a simple offset generator
Replaces all NamedLambdaVariables for expressions which are not in compilationHandlers with a simple offset generator
If a handler exists (i.e. something should be skipped) it's NamedLambdaVariables are returned as these must be kept.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Packages

LambdaCompilationUtils

object LambdaCompilationUtils

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

LambdaCompilationUtils 

object LambdaCompilationUtils

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

LambdaCompilationUtils