Why the Angband random number generator is currently limited to 28 bits explained
Collapse
X
-
You're certainly right that using a while loop to do this is absolutely horrible for performance -- not sure why it's done this way (but see below). I'm sure there are much more efficient ways to do it, though I don't know the right search keywords to find an appropriate algorithm on Wikipedia.
Requestfully, [someone] Git the source code, stick in global variables and report the ratio of successful test-branches / function calls. The ratio should be quite, quite low. (I have the source code, but the last time that I played Angband, I started at 6 pm. I had a level 20 character, but I was very sleepy. I looked at the time and wondered why I was so sleepy at only 3 am. Thirty minutes later, I realized that it was 3:30 pm and that I had been playing Angband for 21 1/2 hours. I'm too old for that and don't intend to play Angband again.)
While hardly a reference, the one - but not asserted only- person on this exchange http://stackoverflow.com/questions/1...mber-generator who can be trusted is Rob Napier. Java's implementation seems to me to be the most efficient, while I would say that my algorithm is slightly more efficient for the purposes of Angband than the arc4 algorithm. The performance differences between the 3 algorithms would not even be measurable in microseconds. The primary difference between the 3 algorithms is that I know that I was not paid to design my algorithm.
My motivation for putting my algorithm into the game was so that I could sleep well knowing that tomorrow's prayers and spells would be answered or rejected efficiently and unbiasedly. There is little practical difference between the old fashioned mod % N method versus using an unbiased algorithm. I just couldn't sleep knowing that Angband's RNG was not perfectly unbiased.Last edited by rhutson; August 16, 2016, 07:58.Comment
-
And you now know exactly why a simple modulo was inconsistent with my continued existence.
Easier for me and as mind food for eager students, read Ben's comments in z-rand.c and then play around with a program that I wrote just today:
Code:#include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <assert.h> uint32_t randit(uint32_t m) { uint32_t n; n = (0x10000000 / m); return n; } int main(int argc, char *argv[]) { uint32_t i; assert(argc == 2); assert((i = atoi(argv[1])) >= 0); assert(i <= 0x10000000); printf("Part size %u\n", randit(i)); }
With parameters of unknown value at compile time, mod is still slightly more expensive than div.Last edited by rhutson; August 16, 2016, 10:19.Comment
-
Comment
-
There is a Lurking Humor in last night's posts. I was a couple of hours into a detailed study comparing the efficiencies and estimating the CPU cycles used by the 3 algorithms mentioned along with speed testing code, and then I wisely decided to use Nick's approach of "Trust me. I wrote it."€ť
As for the efficiency of my algorithm, I was pleased to discover that it is practically as efficient as reference grade library routines. The algorithm is just more obscure since I was dealing with a different core RNG which was known to produce less than pseudorandom sequences in the lower bits.
As for the 28 bit limitation, that can now be safely and trivially bumped up to a more classic 31 bit limitation. If it is really an issue, email me, and I'll email you back some diffs. But any major rewrite of known working code in such a vital section of the game for no obvious reason would be an ill-advised waste of time.Last edited by rhutson; August 16, 2016, 19:11.Comment
-
As for the 28 bit limitation, that can now be safely and trivially bumped up to a more classic 31 bit limitation. If it is really an issue, email me, and I'll email you back some diffs. But any major rewrite of known working code in such a vital section of the game for no obvious reason would be an ill-advised waste of time.
I mostly wanted the replacement for the same somewhat perverse reasons as yourself, but I also want to reduce the amount of "in-house" code and since PCG is a header-only implementation, it's completely trivial to include it.
(I haven't yet pushed this to the public GH repo. I should probably start gathering commits to push...)Comment
-
Good news everyone!
I morphed z-rand.c into a Rand_div() speed test and study program. My memory of prior testing and claim that any attempt at optimizing my algorithm any further could not even be measured in a full microseconds have been validated! And I expanded the range and domain of Rand_div into 31 bit territory as well.
My speed test and the loop test reults on a 2.6 GHz Intel Core i5 compiled with clang -O2 on OS X 10.11.6 :
Code:Newton- 1:39PM-s009 ~/src/angband/src% time ./a.out 2 1000000000 calls, 1000000000 loop entries, test and branch success rate of 0.00000000% ./a.out 2 14.03s user 0.02s system 99% cpu 14.082 total Newton- 1:40PM-s009 ~/src/angband/src% printf '< %.2f microseconds per call\n' $((14.03 / 1000000000 * 1000000)) < 0.01 microseconds per call Newton- 1:40PM-s009 ~/src/angband/src% time ./a.out 100 1000000000 calls, 1000000024 loop entries, test and branch success rate of 0.00000240% ./a.out 100 14.00s user 0.02s system 99% cpu 14.046 total Newton- 1:40PM-s009 ~/src/angband/src% time ./a.out 100000000 1000000000 calls, 1022613980 loop entries, test and branch success rate of 2.26139800% ./a.out 100000000 14.50s user 0.02s system 99% cpu 14.550 total Newton- 1:42PM-s009 ~/src/angband/src% printf '< %.2f microseconds per call\n' $((14.5 / 1000000000 * 1000000)) < 0.01 microseconds per call Newton- 1:42PM-s009 ~/src/angband/src% time ./a.out 100000001 1000000000 calls, 1022608062 loop entries, test and branch success rate of 2.26080620% ./a.out 100000001 14.50s user 0.02s system 99% cpu 14.554 total Newton- 1:48PM-s009 ~/src/angband/src% time ./a.out 200000001 1000000000 calls, 1073732253 loop entries, test and branch success rate of 7.37322530% ./a.out 200000001 15.69s user 0.03s system 99% cpu 15.747 total Newton- 1:49PM-s009 ~/src/angband/src% printf '< %.2f microseconds per call\n' $((15.69 / 1000000000 * 1000000)) < 0.02 microseconds per call Newton- 1:49PM-s009 ~/src/angband/src% time ./a.out 2000000000 1000000000 calls, 1073726417 loop entries, test and branch success rate of 7.37264170% ./a.out 2000000000 15.52s user 0.02s system 99% cpu 15.570 total
As for the idea of expanding the RNG from 28 bits to 31 bits, that's not my call. I've compiled and am "playing" 4.0.4. (It looks nice!) I'm on the town level, and I should perhaps stay in town for now.
The edit:
Actually, thinking about this further, there may be a bug in what my test program is reporting as "test and branch success rate", as that should be a little under 7%, not a little over 7%. Also, there are definitely some efficiency advantages to expanding the range of Rand_div() using that algorithm. I'm not sure if the matter is worth bothering with though. I'll ponder the matter later after I deal with the lawn mower belt.Comment
-
What about "perf" instead of "time"? (Regardless of whether that's better, "time" is definitely not a good way to benchmark anything if you're looking a variation at the sub-second level.)
"perf" will at least show you 'cycle counts'.Comment
-
The reason that I became foucsed on the RNG was because changes that Ben made to the game slowed it down considerably. I profiled the game and noticed that the plurality (IIRC) of CPU time in the game was being spent in the RNG. Ben did "something" to reduce RNG calls while I focused on improving the efficiency of the RNG. Wanting the best for Angband, I added the algorithm to eliminate the inherent bias of the random() % N method (which was "satisfying").
BTW: My last Angband session was over 21 hours nonstop.Last edited by rhutson; August 17, 2016, 05:49.Comment
-
zsh: command not found: perf
I did find the Wikipedia page for 'perf', but something like that is not needed for what I was/am doing. I did note, e.g., '< 0.02 microseconds per call' as zsh's builtin 'time' reported the CPU usage for the entire program. After that, the math was simple enough. (getrusage() still exists, but it was not needed.)
Oh, there was not a bug in my test program. I encountered a printf terminology issue while multitasking, and after my unedited post, I noticed that what I thought should be 6.7% was 7.4%. Needing to commute, I edited and declared a possible bug in my hastily composed test program.
With the 31 bit version of the RNG, there is a worst case 6.7% chance of the branch being taken. But if the branch is taken, there is another 6.7% chance of the branch being taken again ... (Unnecessary careful study of earlier posts indicate that I was quite aware of relooping. // again, multitasking)
The 28 bit vs 31 bit RNG efficiency study is interesting, but would be more so in later days.Comment
-
My point was merely that 'perf' can tell you exact (well, as exact as possible) numbers for many things you might be interested in. (Like branch prediction and such.)Comment
-
Here is stripped down code of the program which I am speed testing with zsh's time (/usr/bin/time yields essentially identical results) :
Code:static int loops, calls; // stripped z-rand31.c begins } else { ++calls; /* Use a complex RNG */ while (1) { ++loops; // stripped z-rand31.c continues and ends // omitted test program stuff int main(int argc, char *argv[]) { // omitted test program stuff which includes converting argv[1] to the integer n for (i = 0; i < BIG; ++i) // I use a do-while loop, but this is more conventional. { if (randint0(n) >= n) die("Uh oh"); } printf("%d calls, %d loop entries, (loops - calls) / calls * 100: %.8lf\n", calls, loops, (double)(loops - calls) / calls * 100); }
I am now reminded of an amusing true story related as dialogue which I hope that you and some others appreciate:
"And so, the algorithm runs in `O(n log(n)^2) + 483' time."
"Professor, why did you include 483 in your analysis? I thought that we were supposed to removed constants from an analysis."
"Oh, you're right." // "+ 483" erased from chalkboard.
Addressing any concerns about the accuracy of a modern 'time', I do notice that Apple's unmaintained(?) man pages for /usr/bin/time do add a warning about the 'time' utilty:
BUGS
The granularity of seconds on microprocessors is crude and can result in times being reported for CPU usage which are too large by a second.
BSD June 6, 1993 BSD
Newton- 2:47PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000011 loop entries, (loops - calls) / calls * 100: 0.00000220
a.out 400 6.85s user 0.01s system 99% cpu 6.886 total
Newton- 2:47PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000014 loop entries, (loops - calls) / calls * 100: 0.00000280
a.out 400 6.86s user 0.01s system 99% cpu 6.890 total
Newton- 2:48PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000009 loop entries, (loops - calls) / calls * 100: 0.00000180
a.out 400 6.86s user 0.01s system 99% cpu 6.890 total
Newton- 2:48PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000014 loop entries, (loops - calls) / calls * 100: 0.00000280
a.out 400 6.94s user 0.02s system 99% cpu 6.983 total
Newton- 3:21PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000009 loop entries, (loops - calls) / calls * 100: 0.00000180
a.out 400 6.86s user 0.01s system 99% cpu 6.891 total
Newton- 3:21PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000009 loop entries, (loops - calls) / calls * 100: 0.00000180
a.out 400 6.86s user 0.02s system 99% cpu 6.901 total
Newton- 3:21PM-s001 ~/src/angband/src% time a.out 400
500000000 calls, 500000014 loop entries, (loops - calls) / calls * 100: 0.00000280
a.out 400 6.85s user 0.02s system 99% cpu 6.892 total
But remember, the main idea of the test program was to verify and assure that my unbiased algorithm was not looping a lot. 20 years ago or whenever, I used the "static int calls, loops;" method while testing the algorithm. I added a game command to occasionally print their values while playing the game. I don't recall the specifics, only that I did not observe any alarming looping. The shell speed measurements and numerical demonstration that randint0() has indeed been working correctly all of these years were add-ons.
P.S. Obviously, I enjoy discussing this subject with you. And I do need to check the looping performace of the 28-bit RNG.
PS2 The reason for the quick P.S. was that I needed to "disconnect", and I think that my post should have begun thusly:
"Thank you for sharing that info about 'perf'. However, I abandoned my 16 year Linux project in 2008 for personal reasons. Any advice or help offered by you or anyone else is always welcomed!"Last edited by rhutson; August 18, 2016, 05:00.Comment
Comment