Random Number Generator: Difference between revisions

m
Bot: Replace deprecated source tag with syntaxhighlight
[unchecked revision][unchecked revision]
(Add adversarial entropy from Nullplan's version)
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(6 intermediate revisions by 3 users not shown)
Line 23:
There are roughly two kinds of applications: cryptography, and everything else.
 
In order to understand the following section, we first should review what cryptographic security means with respect to random number generators. Cryptographers distinguish two types of security: information-theoretic security and computational security. The former of these is the stronger claim. If a cryptosystem is information-theoretically secure, an attacker cannot say anything about an encrypted message without owning the key. Such cryptosystems are rare. Since the attacker must not be able to say '''anything''', everything including patterns, timing and even the length must be hidden. How can one even hide the length of a message? The only choices are to not send the message at all, which defeats the purpose of a cryptosystem, or to send an infinitely long message. So once you start sending, you can never stop.
In a cryptographic system it is often necessary to generate values that are absolutely not predictable. If an adversary knows what the next outputs will be, it will be very easy for them to guess the private keys you generate. This will, for example, compromise HTTPS traffic.
 
This is why computational security is the weaker, but more practical kind: the ciphertext must be difficult to find, ''given some reasonable computing power''. This limit is chosen with the expectation that it will not be reached in the coming decades. This also allows the user to "leak" some information about the ciphertext (the length, for example), as long finding the original input remains difficult. These systems usually use some operation that is difficult to invert unless the key is known, such as [https://en.wikipedia.org/wiki/RSA_(cryptosystem) factoring very large primes]. It should be obvious that the key must not be predictable: if an adversary knows what the next outputs will be, it will be very easy for them to guess keys until the correct one is found.
 
"True" random numbers are the best source for cryptographically-secure randomness, however it is difficult to build a large, quickly-replenished pool of entropy where each source is perfectly de-biased and cannot be influenced by the other sources. To "stretch" the entropy in a way that is difficult to reverse, one can use a cryptographically-secure random number generator (CSPRNG). CSPRNGs guarantee that it is computationally difficult to guess the next output having seen previous results, and, if the generator's state is known, which values preceded the known outputs.
Line 33 ⟶ 35:
Newer x86 and x86-64 processors have the instruction RDSEED for generating random numbers.
To use RDSEED you will first need to check if the instruction is available.
<sourcesyntaxhighlight lang="asm">
mov eax, 7 ; set EAX to request function 7
mov ecx, 0 ; set ECX to request subfunction 0
Line 39 ⟶ 41:
shr ebx, 18 ; the result we want is in EBX...
and ebx, 1 ; ...test for the flag of interest...
</syntaxhighlight>
</source>
 
If EBX is set to 1 (or ZF is unset), the instruction is available. Then it can be used to generate a 16/32/64 bit random number (depending on the register size used as an argument). If a random number was generated, the Carry flag is set.
 
<sourcesyntaxhighlight lang="asm">
mov ecx, 100 ; number of retries
.retry:
Line 53 ⟶ 55:
.done:
; random number is in EAX
</syntaxhighlight>
</source>
 
It is up to you and your users to trust RDSEED. Because it is implemented in hardware, it effectively is a black box that may contain all sorts of bugs, or worse, backdoors.
Line 75 ⟶ 77:
==== Adversarial entropy ====
If an adversary can somehow observe the state of the entropy pool and contribute their own entropy into the pool, then it is possible that they would provide entropy in such a way as to force the entropy pool into a lower entropy state. A simple example would be an entropy source periodically checking if a certain bit in the pool is set, and then providing entropy until it is clear. This way, for most of the time, that given bit is clear, and the number of possible states the entropy buffer can be in is halved. A more sophisticated attack is described on DJB's blog: https://blog.cr.yp.to/20140205-entropy.html
 
It is also partly for this reason that it is unwise to expose the entropy pool unmodified if the user requests a random number. If an adversary has access to the pool (either via a dedicated "add-entropy" interface or a sampled event source), it will be very easy to poison it. A common method used to hide the exact state is to hash (parts of) the pool in combination with counter, for instance the entropy counter, and a salt, using a cryptographically secure hashing function like SHA-256. Because these hash algorithms are difficult to invert, its inputs cannot be easily guessed. It is important to do this only if the pool has some entropy left.
 
== Cryptographically secure pseudorandom number generators ==
 
Now follow some CSPRNGs. It is important to remember that, as with everyingeverything cryptographic, it is best <em>not</em> to homebrew it if you are planning on actually using it. There are many ways things can go wrong, and the more complex the algorithm, the more chances of you making a mistake. Of course, for hobby uses it's perfectly fine; just don't go online banking with your handmade TLS key source.
 
=== x86 RDRAND Instruction ===
RDRAND is slightly older than RDSEED, and provides ([https://software.intel.com/content/www/us/en/develop/blogs/the-difference-between-rdrand-and-rdseed.html what Intel claims to be]) a CSPRNG.
Its presence is indicated by CPUID leaf 1, ECX bit 30:
<sourcesyntaxhighlight lang="asm">
mov eax, 1 ; set EAX to request function 1
mov ecx, 0 ; set ECX to request subfunction 0
Line 89 ⟶ 93:
shr ecx, 30 ; the result we want is in ECX...
and ecx, 1 ; ...test for the flag of interest...
</syntaxhighlight>
</source>
 
If ECX is set to 1 (or ZF is unset), the instruction is available. Usage is identical to RDSEED:
 
<sourcesyntaxhighlight lang="asm">
mov ecx, 100 ; number of retries
.retry:
Line 103 ⟶ 107:
.done:
; random number is in EAX
</syntaxhighlight>
</source>
 
It is automatically seeded by the same entropy source that RDSEED reads from, and cannot be seeded manually. It therefore is subject to the same caveats as RDSEED.
Line 112 ⟶ 116:
 
[https://www.bentasker.co.uk/blog/software-development/689-writing-a-chacha20-based-csprng Writing (and backdooring) a ChaCha20 based CSPRNG] may be an interesting article on the subject, and how it can go wrong in surprising ways.
 
== Pseudorandom number generators ==
 
Line 121 ⟶ 125:
Taken directly from the [http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf C standard document (p. 347)]:
 
<sourcesyntaxhighlight lang="c">
// The following functions define a portable implementation of rand and srand.
 
Line 136 ⟶ 140:
next = seed;
}
</syntaxhighlight>
</source>
 
This is a fairly standard but mediocre [https://en.wikipedia.org/wiki/Linear_congruential_generator Linear Congruential Generator] (LCG). It returns 15 random bits.
Line 167 ⟶ 171:
 
Note that the seed must never be zero. This also means that it is never possible for all registers to have bit value zero, and that of the 2<sup>n</sup> possible combinations of registers, the all-zero state is not allowed. Therefore, 2<sup>n</sup> - 1 states is the maximum possible.
 
=== Wichmann-Hill ===
In 1982, Brian Wichmann and David Hill [https://doi.org/10.2307/2347988 proposed] to combine three linear congruential generators, then normalizing and summing the results to get a number uniformly distributed between 0 and 1. A common instance is:
 
<syntaxhighlight lang="c">
static uint16_t seed[3]; /* seed with numbers between 1 and 30000 */
float wichmann_hill(void) {
seed[0] = (171 * seed[0]) % 30269;
seed[1] = (172 * seed[1]) % 30307;
seed[2] = (170 * seed[2]) % 30323;
return fmod(seed[0] / 30269.0 + seed[1] / 30307.0 + seed[2] / 30323.0, 1.0);
}
</syntaxhighlight>
 
This version is known to have a period of just shy of of seven trillion (the least common multiple of 30268, 30306, and 30322).
 
=== Mersenne Twister ===
Line 172 ⟶ 191:
The Mersenne Twister is an incredibly popular PRNG and is available on most platforms; for example, it is part of the set of generators required by the C++11 standard library. An implementation of the 64-bit MT-19937 version in C is as follows:
 
Implementation according to Wikipedia:
<source lang="c">
// MT-19937-64. Declare mt_seed and mt_rand extern in some header
// Based on mt19937ar.c found here: http://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/MT2002/CODES/mt19937ar.c
 
<syntaxhighlight lang="c">
#include <stddef.h>
#include <stdint.h>
#include <assert.h>
 
#ifdef USE32
#define W 64
typedef uint32_t word_t;
#define N 312
#define MSTATE_SIZE 156 624
#define RMIDDLE 31 397
#define INIT_SHIFT 30
#define INIT_FACT 1812433253
#define TWIST_MASK 0x9908b0df
#define SHIFT1 11
#define MASK1 0xffffffff
#define SHIFT2 7
#define MASK2 0x9d2c5680
#define SHIFT3 15
#define MASK3 0xefc60000
#define SHIFT4 18
#else
typedef uint64_t word_t;
#define STATE_SIZE 312
#define MIDDLE 156
#define INIT_SHIFT 62
#define TWIST_MASK 0xb5026f5aa96619e9
#define INIT_FACT 6364136223846793005
#define SHIFT1 29
#define MASK1 0x5555555555555555
#define SHIFT2 17
#define MASK2 0x71d67fffeda60000
#define SHIFT3 37
#define MASK3 0xfff7eee000000000
#define SHIFT4 43
#endif
 
#define ALOWER_MASK 0xB5026F5AA96619E9ULL 0x7fffffff
#define UUPPER_MASK 29 (~(word_t)LOWER_MASK)
static word_t state[STATE_SIZE];
#define D 0x5555555555555555ULL
static size_t index = STATE_SIZE + 1;
#define S 17
#define B 0x71D67FFFEDA60000ULL
#define T 37
#define C 0xFFF7EEE000000000ULL
#define L 43
#define F 6364136223846793005
 
static void seed(word_t s)
#define MASK_LOW ((1ULL << R) - 1)
{
#define MASK_UPP (~MASK_LOW)
index = STATE_SIZE;
 
static uint64_t state[N0] = s;
for (size_t i = 1; i < STATE_SIZE; i++)
static unsigned int index = N + 1;
state[i] = (INIT_FACT * (state[i - 1] ^ (state[i - 1] >> INIT_SHIFT))) + i;
}
 
static void mt_twisttwist(void) {
{
for (unsigned int i = 0; i < N; ++i) {
for (size_t i = 0; i < STATE_SIZE; i++)
uint64_t x = (state[i] & MASK_UPP) + (state[(i + 1) % N] & MASK_LOW);
{
uint64_t xa = x >> 1;
word_t x = (state[i] & UPPER_MASK) | (state[(i + 1) % STATE_SIZE] & LOWER_MASK);
if (x & 1) {
x = (x >> xa1) ^= A(x & 1? TWIST_MASK : 0);
state[i] = state[(i + MIDDLE) % STATE_SIZE] ^ x;
}
state[i] = state[(i + M) % N] ^ xa;
}
index = 0;
}
 
static word_t mt_random(void)
void mt_seed(uint64_t seed) {
{
state[0] = seed;
if (index >= N;STATE_SIZE)
{
for (unsigned int i = 1; i < N; ++i) {
assert(index == STATE_SIZE || !"Generator never seeded");
state[i] = F * (state[i - 1] ^ (state[i - 1] >> (W - 2))) + i;
twist();
}
}
 
word_t y = state[index];
uint64_t mt_rand(void) {
ify ^= (indexy >=> NSHIFT1) {& MASK1;
y if^= (indexy ><< NSHIFT2) {& MASK2;
y ^= (y << SHIFT3) & mt_seed(5489)MASK3;
y ^= y >> }SHIFT4;
mt_twist();
}
 
uint64_t y = state[index]++;
y ^= (y >> U) & D;
y ^= (y << S) & B;
y ^= (y << T) & C;
y ^= y >> L;
 
++index;
return y;
}</syntaxhighlight>
}
 
</source>
Some implementations automatically seed the generator with seed 5489, but this will (obviously) lead to the same outputs at every initialization.
 
 
It outperforms all PRNGs listed above, but it is rather slow due to its large state size. It also has problems in some statistical areas. See [https://arxiv.org/abs/1910.06437 "It is high time we let go of the Mersenne Twister"].
Line 254 ⟶ 289:
If you are still not satisfied, you may want to build your own PRNG. You can do this in a fairly trial-and-error way by writing something that looks good and subjecting it to automated statistical testing suites such as [http://simul.iro.umontreal.ca/testu01/tu01.html TestU01]'s SmallCrush, Crush and finally BigCrush. These tests are very easy to run and can quickly point out problems, for example:
 
<sourcesyntaxhighlight lang="c">
#include <unif01.h>
#include <bbattery.h>
Line 272 ⟶ 307:
unif01_DeleteExternGenBits(gen);
}
</syntaxhighlight>
</source>
 
SmallCrush will report that this generator failed 12 out of 15 statistical tests. The other tests, which are also much slower, are therefore not necessary.