package id
- Alphabetic
- Public
- Protected
Type Members
- case class AsBase64Fields(children: Seq[Expression]) extends Expression with InputTypeChecks with CodegenFallback with Product with Serializable
For an id structure generates a string, first arg is an int, the others long
- case class AsBase64Struct(child: Expression) extends UnaryExpression with IDStructChecker with CodegenFallback with Product with Serializable
For an id structure generates a string
- sealed trait BaseWithLongs extends ID
Represents an ID using base with length and 4 bit leading type and an array of longs
- case class GenericLongBasedID(idType: IDType, array: Array[Long]) extends BaseWithLongs with Product with Serializable
Represents a GenericLongBasedID
- case class GenericLongBasedIDExpression(id: IDType, child: Expression, prefix: String) extends UnaryExpression with CodegenFallback with Product with Serializable
Delegates ID creation to some other expression which must provide an array of longs result.
Delegates ID creation to some other expression which must provide an array of longs result.
- id
type of the GenericLong compatible ID to be generated
- child
an expression generating an Array Of Longs
- prefix
how to name the field
- trait GenericLongBasedImports extends AnyRef
- case class GuaranteedUniqueID(mac: Array[Byte] = model.localMAC, ms: Long = System.currentTimeMillis() - model.guaranteedUniqueEpoch, partition: Int = 0, row: Int = 0) extends BaseWithLongs with Serializable with Product
Represents a Guaranteed globally unique 160 bit ID within spark based on twitters 64bit snowflake id: leading 4 bits - type of id padded 4 bits for future use / ease of code next 48 bits - MAC address of the drivers host network card (unique for run with no central server / service requirement) next 32 bits - partition id next 41 bits - ms since 20210101 padded 7 bits - for future use / ease of code remaining 24 bits - partition specific incremented row id - allowing 16777216 rows per ms, which is unlikely to occur and easy to manage overflow either way
Represents a Guaranteed globally unique 160 bit ID within spark based on twitters 64bit snowflake id: leading 4 bits - type of id padded 4 bits for future use / ease of code next 48 bits - MAC address of the drivers host network card (unique for run with no central server / service requirement) next 32 bits - partition id next 41 bits - ms since 20210101 padded 7 bits - for future use / ease of code remaining 24 bits - partition specific incremented row id - allowing 16777216 rows per ms, which is unlikely to occur and easy to manage overflow either way
The organisation allows the base and first long to be fixed at the start of a partition reset with only the last long changing per row easing bit fiddling. In the case of overflow the header remains untouched and only then must 41 bits ms value be re-evaluated, reducing clock slow downs (coincidentally around 41ms).
Full time synchronisation across all clusters in a business is not required as MAC address provides driver uniqueness (assuming your cloud / network provider guarantees this uniqueness for routing). Partition id provides segregation across a given action with timestamp ensuring repeated runs on the same cluster do not overlap.
- trait GuaranteedUniqueIDImports extends AnyRef
- class GuaranteedUniqueIDOps extends AnyRef
Manipulations on the long values directly for use in expressions etc.
Manipulations on the long values directly for use in expressions etc. Use base64 to marshall as needed
- case class GuaranteedUniqueIdIDExpression(idBase: GuaranteedUniqueID, prefix: String) extends LeafExpression with CodegenFallback with NondeterministicLike with Product with Serializable
Delegates ID creation to some other expression which must provide an array of longs result.
Delegates ID creation to some other expression which must provide an array of longs result.
- idBase
provides the entrance MAC, header, partition and row information
- prefix
how to name the fields
- sealed trait ID extends AnyRef
Represents an extensible ID starting with 160bits.
Represents an extensible ID starting with 160bits.
There are four known usages: - Rng (128bit PRNG using the rng function) - Data Vault (128bit MD5 based on user provided columns) - Snowflake (160bit globally unique identifier based on Twitters approach to unique IDs with a Spark twist) - Provided (upstream provided identifier - variable bit length + 4 for type )
RNG / Provided and Field based are all the same implementation.
Implementations must use the first 4 bits to signify which of the base four types are used with plus extension.
- case class IDFromBase64(child: Expression, size: Int) extends UnaryExpression with InputTypeChecks with CodegenFallback with Product with Serializable
Generates an unprefixed 'raw' id structure of a given size.
Generates an unprefixed 'raw' id structure of a given size. Note that size is fixed, the type can't change on the plan during the plan.
- child
the base64 strings that must have the same size, will return null if it's not the right size, or cannot parse it.
- size
the size for the number of longs to have, 2 longs is 160 bit and the default
- trait IDStructChecker extends Expression with InputTypeChecks
- case class IDToRawIDDataType(child: Expression) extends UnaryExpression with IDStructChecker with CodegenFallback with Product with Serializable
Converts any prefixed id back to rawType (base, i0, i1 etc.)
- sealed trait IDType extends Serializable
- case class InvalidIDType(idType: Byte) extends RuntimeException with Product with Serializable
- case class SizeOfIDString(child: Expression) extends UnaryExpression with InputTypeChecks with CodegenFallback with Product with Serializable
for a given string returns the length of the given id in longs
Value Members
- object GenericLongBasedIDExpression extends Serializable
- object model
Model for ID handling