o

org.apache.spark.sql.qualityFunctions

LambdaCompilationUtils

object LambdaCompilationUtils

Functionality related to LambdaCompilation. Seemingly all HigherOrderFunctions use a lazy val match to extract the NamedLambdaVariable's from the spark LambdaFunction after bind has been called. When doGenCode is called eval _could_ have been called and the lazy val evaluated, as such simply rewriting the tree may not fully work. Additionally the type for NamedLambdaVariable is bound in the lazy val's which means _ANY_ HigherOrderFunction may not tolerate swapping out NamedLambdaVariables for another NamedExpression.

To add to the fun OpenSource Spark HoF's all use CodegenFallback, as does NamedLambdaVariable, so it's possible to swap out some of these implementations if an array_transform is nested in a Fun1 or Fun2. Similarly Fun1's can call Fun2 so the assumptions are for each Fun1/FunN doCodeGen:

1. Use the processLambda function to evaluate the function 2. compilationHandlers uses the quality.lambdaHandlers environment variable to load a comma separated list of fqn=handler pairs 3. each fully qualified class name pair (e.g. org.apache.spark.sql.catalyst.expressions.ZipWith=handler.fqn) handler is loaded 4. processLambda then evaluates the expression tree, for each matching HoF classname it will call the handler 5. handlers are used to perform the custom doGenCode for that expression rather than the default OSS CodegenFallback 6. handlers return the ExprCode AND a list of NamedLambdaVariables who must have .value.set called upon them (e.g. we can't optimise them)

NB The fqn will also be used to check for named' lambdas used through registerLambdaFunctions.

https://github.com/apache/spark/pull/21954 introduced the lambdavariable with AtomicReference, it's inherent performance hit and, due to the difficulty of threading the holder through the expression chain did not have a compilation approach. After it's threaded and bind has been called the variable id is stable as is the AtomicReference, as such it can be swapped out for a simple variable in the same object.

quality.lambdaHandlers will override the default for a given platform on an fqn basis, so you only need to "add" or "replace" the HoFs that cause issue not the entire list of OSS HigherOrderFunctions for example TransformValues. Note that some versions of Databricks provide compilation of their HoF's that may not be compatible in approach.

Disable this approach by using the quality.lambdaHandlers to disable FunN with the default DoCodegenFallbackHandler: quality.lambdaHandlers=org.apache.spark.sql.qualityFunctions.FunN=org.apache.spark.sql.qualityFunctions.DoCodegenFallbackHandler

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. LambdaCompilationUtils
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. trait LambdaCompilationHandler extends AnyRef
  2. implicit class NamedLambdaVariableOps extends AnyRef

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. def compilationHandlers: Map[String, LambdaCompilationHandler]
  7. def convertToCompilationHandlers(objects: Map[String, Any] = loadLambdaCompilationHandlers()): Map[String, LambdaCompilationHandler]
  8. val defaultGen: DoCodegenFallbackHandler
  9. def envLambdaHandlers(env: String = getLambdaEnv): Map[String, String]

    Parses lambda handlers from the quality.lambdaHandlers environment variable as a comma separated lambda|fqn=fqn pair.

    Parses lambda handlers from the quality.lambdaHandlers environment variable as a comma separated lambda|fqn=fqn pair. The key can be either a lambda function name used in registerLambdaFunctions or a fully qualified class name referring to a HigherOrderFunction. The value must be a fully qualified class name for a LambdaCompilationHandler (it is suggested you use a top level class). If you wish to disable compilation for a given lambda or HigherOrderFunction use org.apache.spark.sql.qualityFunctions.DoCodegenFallbackHandler You may use your own to re-write compilation for HoF classes that do not provide it but are hotspots.

  10. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  14. def getLambdaEnv: String
  15. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  17. val lambdaENV: String
  18. def loadLambdaCompilationHandlers(lambdaHandlers: Map[String, String] = envLambdaHandlers()): Map[String, Any]

    Combines the defaultLambdaHandlers for a spark version with environment specific overrides

  19. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  20. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  21. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  22. def processLambda(expr: Expression, ctx: CodegenContext, ev: ExprCode): Expression

    Replaces all NamedLambdaVariables for expressions which are not in compilationHandlers with a simple offset generator

    Replaces all NamedLambdaVariables for expressions which are not in compilationHandlers with a simple offset generator

    If a handler exists (i.e. something should be skipped) it's NamedLambdaVariables are returned as these must be kept.

  23. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  24. def toString(): String
    Definition Classes
    AnyRef → Any
  25. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Ungrouped