Why the Angband random number generator is currently limited to 28 bits explained

**rhutson** · August 16, 2016, 05:22

Originally posted by brbrbr

I am not developer, but very interested:

1) It's year 2016 and the 28 bit mutation remains. Maybe it's time to remove it from z-rand.c as rhutson suggested ????

Feel Freel

Originally posted by brbrbr

The bug means all randint0 generate 1 lower values.

once you have reached the Implementor level.

**rhutson** · August 16, 2016, 06:38

Originally posted by AnonymousHero

You're certainly right that using a while loop to do this is absolutely horrible for performance -- not sure why it's done this way (but see below). I'm sure there are much more efficient ways to do it, though I don't know the right search keywords to find an appropriate algorithm on Wikipedia.

The branch is rarely taken in my algorithm. I counted CPU cycles, and I would have never ever injected the inefficient abomination being described into Angband.

Requestfully, [someone] Git the source code, stick in global variables and report the ratio of successful test-branches / function calls. The ratio should be quite, quite low. (I have the source code, but the last time that I played Angband, I started at 6 pm. I had a level 20 character, but I was very sleepy. I looked at the time and wondered why I was so sleepy at only 3 am. Thirty minutes later, I realized that it was 3:30 pm and that I had been playing Angband for 21 1/2 hours. I'm too old for that and don't intend to play Angband again.)

While hardly a reference, the one - but not asserted only- person on this exchange http://stackoverflow.com/questions/1...mber-generator who can be trusted is Rob Napier. Java's implementation seems to me to be the most efficient, while I would say that my algorithm is slightly more efficient for the purposes of Angband than the arc4 algorithm. The performance differences between the 3 algorithms would not even be measurable in microseconds. The primary difference between the 3 algorithms is that I know that I was not paid to design my algorithm.

Originally posted by AnonymousHero

The "while (1)" method doesn't suffer from this bias.

I'm not sure if this actually matters in practice for a game like Angband, but it obviously matters for statistical applications.

Correct. Angband's current RNG implementation does not suffer from any bias.

My motivation for putting my algorithm into the game was so that I could sleep well knowing that tomorrow's prayers and spells would be answered or rejected efficiently and unbiasedly. There is little practical difference between the old fashioned mod % N method versus using an unbiased algorithm. I just couldn't sleep knowing that Angband's RNG was not perfectly unbiased.

**rhutson** · August 16, 2016, 07:50

Originally posted by brbrbr

Ok, that explains why simple modulo was not used

And you now know exactly why a simple modulo was inconsistent with my continued existence.

Originally posted by brbrbr

Now, I have an issue understanding the "Rand_div" code.
For example:

Code:

/* Partition size */
n = (0x10000000 / m);

The usual call for randint is with m=100(decimal), that means the partitions are not of same size, which MAY bias the random number generator.

You seem to me to be an eager student. I would simply share the headerless contents of the email that I sent to Ben explaining how the algorithm works, but I did not maintain a .record in those days. The easiest way for me to explain the algorithm to you would be to use pencil and paper and then scan and upload my diagrams.

Easier for me and as mind food for eager students, read Ben's comments in z-rand.c and then play around with a program that I wrote just today:

Code:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <assert.h>

uint32_t randit(uint32_t m)
{
    uint32_t n;

    n = (0x10000000 / m);

    return n;
}

int main(int argc, char *argv[])
{
    uint32_t i;

    assert(argc == 2);
    assert((i = atoi(argv[1])) >= 0);
    assert(i <= 0x10000000);

    printf("Part size %u\n", randit(i));
}

Then, please report back your findings! (I do apologize, and no one else would likely relate that my mind is mostly equally focused on the Pros and Cons of Hanging Guitars from a Vaulted Ceiling versus a floorful of guitar stands juxtaposed with the apparent impossiblity of replacing a John Deere lawn mower belt without removing the mower deck.)

Originally posted by brbrbr

It would be much cleaner and faster to use unbiased modulo code:

Code:

#define RAND_MAX=0xFFFFFFFF; 
while(1)
{
  r =  WELLRNG1024a();
  if (r < RAND_MAX - RAND_MAX % m)
    return r % m;
}

That code will run 1 time most of the time. Unbiased.

With parameters of unknown value at compile time, mod is still slightly more expensive than div.

**AnonymousHero** · August 16, 2016, 16:33

Originally posted by rhutson

(About efficiency of the algorithm.)

Right, I think I originally misread the code, but just never corrected myself.

**rhutson** · August 16, 2016, 18:34

Originally posted by AnonymousHero

Right, I think I originally misread the code, but just never corrected myself.

I saw this discussion back in June, but clearly, my attention has been attending to more important matters, hence my delayed response.

There is a Lurking Humor in last night's posts. I was a couple of hours into a detailed study comparing the efficiencies and estimating the CPU cycles used by the 3 algorithms mentioned along with speed testing code, and then I wisely decided to use Nick's approach of "Trust me. I wrote it."€ť

As for the efficiency of my algorithm, I was pleased to discover that it is practically as efficient as reference grade library routines. The algorithm is just more obscure since I was dealing with a different core RNG which was known to produce less than pseudorandom sequences in the lower bits.

As for the 28 bit limitation, that can now be safely and trivially bumped up to a more classic 31 bit limitation. If it is really an issue, email me, and I'll email you back some diffs.

But any major rewrite of known working code in such a vital section of the game for no obvious reason would be an ill-advised waste of time.

**AnonymousHero** · August 16, 2016, 23:33

Originally posted by rhutson

As for the 28 bit limitation, that can now be safely and trivially bumped up to a more classic 31 bit limitation. If it is really an issue, email me, and I'll email you back some diffs.

But any major rewrite of known working code in such a vital section of the game for no obvious reason would be an ill-advised waste of time.

Funnily enough, my interest in this was only tangential and coincidental to replacing the (really, really old) RNG in the ToME 2.x codebase. I've replaced it with a PCG family RNG which gives me a standard C++ interface which means i also get randnor() without any 'weird' tables, etc. It's all good from my perspective

.

I mostly wanted the replacement for the same somewhat perverse reasons as yourself, but I also want to reduce the amount of "in-house" code and since PCG is a header-only implementation, it's completely trivial to include it.

(I haven't yet pushed this to the public GH repo. I should probably start gathering commits to push...)

**rhutson** · August 16, 2016, 23:45

Good news everyone!

I morphed z-rand.c into a Rand_div() speed test and study program. My memory of prior testing and claim that any attempt at optimizing my algorithm any further could not even be measured in a full microseconds have been validated! And I expanded the range and domain of Rand_div into 31 bit territory as well.

My speed test and the loop test reults on a 2.6 GHz Intel Core i5 compiled with clang -O2 on OS X 10.11.6 :

Code:

Newton- 1:39PM-s009 ~/src/angband/src% time ./a.out 2
1000000000 calls, 1000000000 loop entries, test and branch success rate of  0.00000000%
./a.out 2  14.03s user 0.02s system 99% cpu 14.082 total
Newton- 1:40PM-s009 ~/src/angband/src% printf '< %.2f microseconds per call\n' $((14.03 / 1000000000 * 1000000))
< 0.01 microseconds per call
Newton- 1:40PM-s009 ~/src/angband/src% time ./a.out 100                                                         
1000000000 calls, 1000000024 loop entries, test and branch success rate of  0.00000240%
./a.out 100  14.00s user 0.02s system 99% cpu 14.046 total
Newton- 1:40PM-s009 ~/src/angband/src% time ./a.out 100000000
1000000000 calls, 1022613980 loop entries, test and branch success rate of  2.26139800%
./a.out 100000000  14.50s user 0.02s system 99% cpu 14.550 total
Newton- 1:42PM-s009 ~/src/angband/src% printf '< %.2f microseconds per call\n' $((14.5 / 1000000000 * 1000000))
< 0.01 microseconds per call
Newton- 1:42PM-s009 ~/src/angband/src% time ./a.out 100000001                                                  
1000000000 calls, 1022608062 loop entries, test and branch success rate of  2.26080620%
./a.out 100000001  14.50s user 0.02s system 99% cpu 14.554 total
Newton- 1:48PM-s009 ~/src/angband/src% time ./a.out 200000001
1000000000 calls, 1073732253 loop entries, test and branch success rate of  7.37322530%
./a.out 200000001  15.69s user 0.03s system 99% cpu 15.747 total
Newton- 1:49PM-s009 ~/src/angband/src% printf '< %.2f microseconds per call\n' $((15.69 / 1000000000 * 1000000))
< 0.02 microseconds per call
Newton- 1:49PM-s009 ~/src/angband/src% time ./a.out 2000000000                                                  
1000000000 calls, 1073726417 loop entries, test and branch success rate of  7.37264170%
./a.out 2000000000  15.52s user 0.02s system 99% cpu 15.570 total

My program called Rand_div() via the randint0 macro 1 billion times requesting a random number from 0 to n-1 where n is specified via command line. As you can see, the bias issue begins at not being an issue. As n becomes a relatively very large 2 billion, the loop can cycle at least once around 7.4% of the time. For the purposes of the Angband that I played, the unbiased algorithm added no measurable overhead during gameplay. (I can share my test program, but I am probably the only one interested in it.)

As for the idea of expanding the RNG from 28 bits to 31 bits, that's not my call. I've compiled and am "playing" 4.0.4. (It looks nice!) I'm on the town level, and I should perhaps stay in town for now.

The edit:

Actually, thinking about this further, there may be a bug in what my test program is reporting as "test and branch success rate", as that should be a little under 7%, not a little over 7%. Also, there are definitely some efficiency advantages to expanding the range of Rand_div() using that algorithm. I'm not sure if the matter is worth bothering with though. I'll ponder the matter later after I deal with the lawn mower belt.

**AnonymousHero** · August 17, 2016, 02:51

What about "perf" instead of "time"? (Regardless of whether that's better, "time" is definitely not a good way to benchmark anything if you're looking a variation at the sub-second level.)

"perf" will at least show you 'cycle counts'.

**rhutson** · August 17, 2016, 05:42

Originally posted by AnonymousHero

I mostly wanted the replacement for the same somewhat perverse reasons as yoursel

I have developed a "problem" with joking on technical forums. This can cause confusion and create credibility issues. I will try to restrain myself as you and others are interested in learning.

The reason that I became foucsed on the RNG was because changes that Ben made to the game slowed it down considerably. I profiled the game and noticed that the plurality (IIRC) of CPU time in the game was being spent in the RNG. Ben did "something" to reduce RNG calls while I focused on improving the efficiency of the RNG. Wanting the best for Angband, I added the algorithm to eliminate the inherent bias of the random() % N method (which was "satisfying").

BTW: My last Angband session was over 21 hours nonstop.

**rhutson** · August 17, 2016, 07:09

Originally posted by AnonymousHero

What about "perf" instead of "time"? (Regardless of whether that's better, "time" is definitely not a good way to benchmark anything if you're looking a variation at the sub-second level.

Newton- 9:50PM-s001 ~/src/angband/src% perf
zsh: command not found: perf

I did find the Wikipedia page for 'perf', but something like that is not needed for what I was/am doing. I did note, e.g., '< 0.02 microseconds per call' as zsh's builtin 'time' reported the CPU usage for the entire program. After that, the math was simple enough. (getrusage() still exists, but it was not needed.)

Oh, there was not a bug in my test program. I encountered a printf terminology issue while multitasking, and after my unedited post, I noticed that what I thought should be 6.7% was 7.4%. Needing to commute, I edited and declared a possible bug in my hastily composed test program.

With the 31 bit version of the RNG, there is a worst case 6.7% chance of the branch being taken. But if the branch is taken, there is another 6.7% chance of the branch being taken again ... (Unnecessary careful study of earlier posts indicate that I was quite aware of relooping. // again, multitasking)

The 28 bit vs 31 bit RNG efficiency study is interesting, but would be more so in later days.

**AnonymousHero** · August 17, 2016, 10:17

Originally posted by rhutson

Newton- 9:50PM-s001 ~/src/angband/src% perf
zsh: command not found: perf

I did find the Wikipedia page for 'perf', but something like that is not needed for what I was/am doing.

My point was merely that 'perf' can tell you exact (well, as exact as possible) numbers for many things you might be interested in. (Like branch prediction and such.)

**rhutson** · August 18, 2016, 00:30

Originally posted by AnonymousHero

My point was merely that 'perf' can tell you exact (well, as exact as possible) numbers for many things you might be interested in. (Like branch prediction and such.)

I understood your point, but I don't think that you understand what I am doing. 'perf' looks like a tool that I would be using if I was still playing and contributing to Angband while running Linux or even if there was a OS X version. 'perf' does appears to be a good tool for studying the performance of the game itself.

Here is stripped down code of the program which I am speed testing with zsh's time (/usr/bin/time yields essentially identical results) :

Code:

static int loops, calls;

// stripped z-rand31.c begins

        } else {
                ++calls;
                /* Use a complex RNG */
                while (1) {
                    ++loops;
// stripped z-rand31.c continues and ends

// omitted test program stuff

int main(int argc, char *argv[])
{
// omitted test program stuff which includes converting argv[1] to the integer n

    for (i = 0; i < BIG; ++i)        // I use a do-while loop, but this is more conventional.
    {
        if (randint0(n) >= n)
            die("Uh oh");
    }

    printf("%d calls, %d loop entries, (loops - calls) / calls * 100: %.8lf\n", calls, loops, (double)(loops - calls) / calls * 100);
}

The 'time' utility is the right tool for speed testing what I am doing. 'time' is reporting the CPU time of code besides the RNG, but that extra code uses trival CPU time, and it uses a constant amount of time. The constant extra trivial CPU time is what makes the 'time' utility the right tool for optimizing my spare time

, and, again, is also why I qualified the results with '<'.

I am now reminded of an amusing true story related as dialogue which I hope that you and some others appreciate:

"And so, the algorithm runs in `O(n log(n)^2) + 483' time."

"Professor, why did you include 483 in your analysis? I thought that we were supposed to removed constants from an analysis."

"Oh, you're right." // "+ 483" erased from chalkboard.

Addressing any concerns about the accuracy of a modern 'time', I do notice that Apple's unmaintained(?) man pages for /usr/bin/time do add a warning about the 'time' utilty:

BUGS
The granularity of seconds on microprocessors is crude and can result in times being reported for CPU usage which are too large by a second.

BSD June 6, 1993 BSD

That probably was true with BSD in 1993, and I do recall 'time' not having much granularity on 4.2 BSD. However, I've never seen such a bug of that magnitude under Linux or OS X (which I primarily switched to in 2008). I am getting results that are consistent enough for my purposes using getrusage() via zsh or /usr/bin/time:

Newton- 2:47PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000011 loop entries, (loops - calls) / calls * 100: 0.00000220
a.out 400 6.85s user 0.01s system 99% cpu 6.886 total
Newton- 2:47PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000014 loop entries, (loops - calls) / calls * 100: 0.00000280
a.out 400 6.86s user 0.01s system 99% cpu 6.890 total
Newton- 2:48PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000009 loop entries, (loops - calls) / calls * 100: 0.00000180
a.out 400 6.86s user 0.01s system 99% cpu 6.890 total
Newton- 2:48PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000014 loop entries, (loops - calls) / calls * 100: 0.00000280
a.out 400 6.94s user 0.02s system 99% cpu 6.983 total
Newton- 3:21PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000009 loop entries, (loops - calls) / calls * 100: 0.00000180
a.out 400 6.86s user 0.01s system 99% cpu 6.891 total
Newton- 3:21PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000009 loop entries, (loops - calls) / calls * 100: 0.00000180
a.out 400 6.86s user 0.02s system 99% cpu 6.901 total
Newton- 3:21PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000014 loop entries, (loops - calls) / calls * 100: 0.00000280
a.out 400 6.85s user 0.02s system 99% cpu 6.892 total

I do need to come up with a better name for the "(loops - calls) / calls * 100" metric, but at least it is now not "wrong", but just lacking in meaning.

(It's just a test program that I'm working on in my spare time.)

But remember, the main idea of the test program was to verify and assure that my unbiased algorithm was not looping a lot. 20 years ago or whenever, I used the "static int calls, loops;" method while testing the algorithm. I added a game command to occasionally print their values while playing the game. I don't recall the specifics, only that I did not observe any alarming looping. The shell speed measurements and numerical demonstration that randint0() has indeed been working correctly all of these years were add-ons.

P.S. Obviously, I enjoy discussing this subject with you. And I do need to check the looping performace of the 28-bit RNG.

PS2 The reason for the quick P.S. was that I needed to "disconnect", and I think that my post should have begun thusly:

"Thank you for sharing that info about 'perf'. However, I abandoned my 16 year Linux project in 2008 for personal reasons. Any advice or help offered by you or anyone else is always welcomed!"

Why the Angband random number generator is currently limited to 28 bits explained

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment