Packages

package id

Content Hierarchy
Ordering
  1. Alphabetic
Visibility
  1. Public
  2. Protected

Type Members

  1. case class AsBase64Fields(children: Seq[Expression]) extends Expression with InputTypeChecks with CodegenFallback with Product with Serializable

    For an id structure generates a string, first arg is an int, the others long

  2. case class AsBase64Struct(child: Expression) extends UnaryExpression with IDStructChecker with CodegenFallback with Product with Serializable

    For an id structure generates a string

  3. sealed trait BaseWithLongs extends ID

    Represents an ID using base with length and 4 bit leading type and an array of longs

  4. case class GenericLongBasedID(idType: IDType, array: Array[Long]) extends BaseWithLongs with Product with Serializable

    Represents a GenericLongBasedID

  5. case class GenericLongBasedIDExpression(id: IDType, child: Expression, prefix: String) extends UnaryExpression with CodegenFallback with Product with Serializable

    Delegates ID creation to some other expression which must provide an array of longs result.

    Delegates ID creation to some other expression which must provide an array of longs result.

    id

    type of the GenericLong compatible ID to be generated

    child

    an expression generating an Array Of Longs

    prefix

    how to name the field

  6. trait GenericLongBasedImports extends AnyRef
  7. case class GuaranteedUniqueID(mac: Array[Byte] = model.localMAC, ms: Long = System.currentTimeMillis() - model.guaranteedUniqueEpoch, partition: Int = 0, row: Int = 0) extends BaseWithLongs with Serializable with Product

    Represents a Guaranteed globally unique 160 bit ID within spark based on twitters 64bit snowflake id: leading 4 bits - type of id padded 4 bits for future use / ease of code next 48 bits - MAC address of the drivers host network card (unique for run with no central server / service requirement) next 32 bits - partition id next 41 bits - ms since 20210101 padded 7 bits - for future use / ease of code remaining 24 bits - partition specific incremented row id - allowing 16777216 rows per ms, which is unlikely to occur and easy to manage overflow either way

    Represents a Guaranteed globally unique 160 bit ID within spark based on twitters 64bit snowflake id: leading 4 bits - type of id padded 4 bits for future use / ease of code next 48 bits - MAC address of the drivers host network card (unique for run with no central server / service requirement) next 32 bits - partition id next 41 bits - ms since 20210101 padded 7 bits - for future use / ease of code remaining 24 bits - partition specific incremented row id - allowing 16777216 rows per ms, which is unlikely to occur and easy to manage overflow either way

    The organisation allows the base and first long to be fixed at the start of a partition reset with only the last long changing per row easing bit fiddling. In the case of overflow the header remains untouched and only then must 41 bits ms value be re-evaluated, reducing clock slow downs (coincidentally around 41ms).

    Full time synchronisation across all clusters in a business is not required as MAC address provides driver uniqueness (assuming your cloud / network provider guarantees this uniqueness for routing). Partition id provides segregation across a given action with timestamp ensuring repeated runs on the same cluster do not overlap.

  8. trait GuaranteedUniqueIDImports extends AnyRef
  9. class GuaranteedUniqueIDOps extends AnyRef

    Manipulations on the long values directly for use in expressions etc.

    Manipulations on the long values directly for use in expressions etc. Use base64 to marshall as needed

  10. case class GuaranteedUniqueIdIDExpression(idBase: GuaranteedUniqueID, prefix: String) extends LeafExpression with CodegenFallback with NondeterministicLike with Product with Serializable

    Delegates ID creation to some other expression which must provide an array of longs result.

    Delegates ID creation to some other expression which must provide an array of longs result.

    idBase

    provides the entrance MAC, header, partition and row information

    prefix

    how to name the fields

  11. sealed trait ID extends AnyRef

    Represents an extensible ID starting with 160bits.

    Represents an extensible ID starting with 160bits.

    There are four known usages: - Rng (128bit PRNG using the rng function) - Data Vault (128bit MD5 based on user provided columns) - Snowflake (160bit globally unique identifier based on Twitters approach to unique IDs with a Spark twist) - Provided (upstream provided identifier - variable bit length + 4 for type )

    RNG / Provided and Field based are all the same implementation.

    Implementations must use the first 4 bits to signify which of the base four types are used with plus extension.

  12. case class IDFromBase64(child: Expression, size: Int) extends UnaryExpression with InputTypeChecks with CodegenFallback with Product with Serializable

    Generates an unprefixed 'raw' id structure of a given size.

    Generates an unprefixed 'raw' id structure of a given size. Note that size is fixed, the type can't change on the plan during the plan.

    child

    the base64 strings that must have the same size, will return null if it's not the right size, or cannot parse it.

    size

    the size for the number of longs to have, 2 longs is 160 bit and the default

  13. trait IDStructChecker extends Expression with InputTypeChecks
  14. case class IDToRawIDDataType(child: Expression) extends UnaryExpression with IDStructChecker with CodegenFallback with Product with Serializable

    Converts any prefixed id back to rawType (base, i0, i1 etc.)

  15. sealed trait IDType extends Serializable
  16. case class InvalidIDType(idType: Byte) extends RuntimeException with Product with Serializable
  17. case class SizeOfIDString(child: Expression) extends UnaryExpression with InputTypeChecks with CodegenFallback with Product with Serializable

    for a given string returns the length of the given id in longs

Value Members

  1. object GenericLongBasedIDExpression extends Serializable
  2. object model

    Model for ID handling

Ungrouped