PRNG Functions

The existing Spark rand function has a few of limitations:

It generates doubles
Has a fixed implementation
Only provides reseeding on each new parition ignoring splittable / jumpable algorithms

The Quality psuedorandom generators produce either 128bit values (two longs) or a configurable number of bytes and, as a result, do not suffer precision issues, they also leverage RandomSource implementations allowing users to choose the algorithm used.

In addition, by leveraging .isJumpable and the resulting jump function the Quality prng function can benefit from the implementations own approach to managing overalapping intervals across the cluster.

RNG Expressions¶

rngBytes( [number of bytes to fill - defaults to 16], [RandomSource RNG Impl - defaults to 'XO_RO_SHI_RO_128_PP'], [seed - defaults to 0] ) - Uses commons rng to create byte arrays, implementations can be plugged in, when seed is 0 the RNG's default seed generator is used. Note when a given RNG isJumpable then it will use jumping for each partition where possible both improving speed and statistical results.
rng( [RandomSource RNG Impl - defaults to 'XO_RO_SHI_RO_128_PP'], [seed - defaults to 0] ) - Uses commons rng to create byte arrays, implementations can be plugged in, when seed is 0 the RNG's default seed generator is used. Note when a given RNG isJumpable then it will use jumping for each partition where possible both improving speed and statistical results.
rngUUID( expr ) - processes expr with either byte arrays or two longs into a UUID string, it's counterpart longPairFromUUID generates two longs

Last update: March 27, 2023 09:08:01
Created: March 27, 2023 09:08:01