Why the Angband random number generator is currently limited to 28 bits explained

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rhutson
    Rookie
    • Jul 2013
    • 23

    #16
    Originally posted by brbrbr
    I am not developer, but very interested:

    1) It's year 2016 and the 28 bit mutation remains. Maybe it's time to remove it from z-rand.c as rhutson suggested ????
    Feel Freel

    Originally posted by brbrbr
    The bug means all randint0 generate 1 lower values.
    once you have reached the Implementor level.

    Comment

    • rhutson
      Rookie
      • Jul 2013
      • 23

      #17
      Originally posted by AnonymousHero
      You're certainly right that using a while loop to do this is absolutely horrible for performance -- not sure why it's done this way (but see below). I'm sure there are much more efficient ways to do it, though I don't know the right search keywords to find an appropriate algorithm on Wikipedia.
      The branch is rarely taken in my algorithm. I counted CPU cycles, and I would have never ever injected the inefficient abomination being described into Angband.

      Requestfully, [someone] Git the source code, stick in global variables and report the ratio of successful test-branches / function calls. The ratio should be quite, quite low. (I have the source code, but the last time that I played Angband, I started at 6 pm. I had a level 20 character, but I was very sleepy. I looked at the time and wondered why I was so sleepy at only 3 am. Thirty minutes later, I realized that it was 3:30 pm and that I had been playing Angband for 21 1/2 hours. I'm too old for that and don't intend to play Angband again.)

      While hardly a reference, the one - but not asserted only- person on this exchange http://stackoverflow.com/questions/1...mber-generator who can be trusted is Rob Napier. Java's implementation seems to me to be the most efficient, while I would say that my algorithm is slightly more efficient for the purposes of Angband than the arc4 algorithm. The performance differences between the 3 algorithms would not even be measurable in microseconds. The primary difference between the 3 algorithms is that I know that I was not paid to design my algorithm.


      Originally posted by AnonymousHero
      The "while (1)" method doesn't suffer from this bias.

      I'm not sure if this actually matters in practice for a game like Angband, but it obviously matters for statistical applications.
      Correct. Angband's current RNG implementation does not suffer from any bias.

      My motivation for putting my algorithm into the game was so that I could sleep well knowing that tomorrow's prayers and spells would be answered or rejected efficiently and unbiasedly. There is little practical difference between the old fashioned mod % N method versus using an unbiased algorithm. I just couldn't sleep knowing that Angband's RNG was not perfectly unbiased.
      Last edited by rhutson; August 16, 2016, 07:58.

      Comment

      • rhutson
        Rookie
        • Jul 2013
        • 23

        #18
        Originally posted by brbrbr
        Ok, that explains why simple modulo was not used
        And you now know exactly why a simple modulo was inconsistent with my continued existence.

        Originally posted by brbrbr
        Now, I have an issue understanding the "Rand_div" code.
        For example:
        Code:
        /* Partition size */
        n = (0x10000000 / m);
        The usual call for randint is with m=100(decimal), that means the partitions are not of same size, which MAY bias the random number generator.
        You seem to me to be an eager student. I would simply share the headerless contents of the email that I sent to Ben explaining how the algorithm works, but I did not maintain a .record in those days. The easiest way for me to explain the algorithm to you would be to use pencil and paper and then scan and upload my diagrams.

        Easier for me and as mind food for eager students, read Ben's comments in z-rand.c and then play around with a program that I wrote just today:

        Code:
        #include <stdio.h>
        #include <stdint.h>
        #include <stdlib.h>
        #include <assert.h>
        
        uint32_t randit(uint32_t m)
        {
            uint32_t n;
        
            n = (0x10000000 / m);
        
            return n;
        }
        
        int main(int argc, char *argv[])
        {
            uint32_t i;
        
            assert(argc == 2);
            assert((i = atoi(argv[1])) >= 0);
            assert(i <= 0x10000000);
        
            printf("Part size %u\n", randit(i));
        }
        Then, please report back your findings! (I do apologize, and no one else would likely relate that my mind is mostly equally focused on the Pros and Cons of Hanging Guitars from a Vaulted Ceiling versus a floorful of guitar stands juxtaposed with the apparent impossiblity of replacing a John Deere lawn mower belt without removing the mower deck.)

        Originally posted by brbrbr
        It would be much cleaner and faster to use unbiased modulo code:
        Code:
        #define RAND_MAX=0xFFFFFFFF; 
        while(1)
        {
          r =  WELLRNG1024a();
          if (r < RAND_MAX - RAND_MAX % m)
            return r % m;
        }
        That code will run 1 time most of the time. Unbiased.
        With parameters of unknown value at compile time, mod is still slightly more expensive than div.
        Last edited by rhutson; August 16, 2016, 10:19.

        Comment

        • AnonymousHero
          Veteran
          • Jun 2007
          • 1393

          #19
          Originally posted by rhutson
          (About efficiency of the algorithm.)
          Right, I think I originally misread the code, but just never corrected myself.

          Comment

          • rhutson
            Rookie
            • Jul 2013
            • 23

            #20
            Originally posted by AnonymousHero
            Right, I think I originally misread the code, but just never corrected myself.
            I saw this discussion back in June, but clearly, my attention has been attending to more important matters, hence my delayed response.

            There is a Lurking Humor in last night's posts. I was a couple of hours into a detailed study comparing the efficiencies and estimating the CPU cycles used by the 3 algorithms mentioned along with speed testing code, and then I wisely decided to use Nick's approach of "Trust me. I wrote it."€ť

            As for the efficiency of my algorithm, I was pleased to discover that it is practically as efficient as reference grade library routines. The algorithm is just more obscure since I was dealing with a different core RNG which was known to produce less than pseudorandom sequences in the lower bits.

            As for the 28 bit limitation, that can now be safely and trivially bumped up to a more classic 31 bit limitation. If it is really an issue, email me, and I'll email you back some diffs. But any major rewrite of known working code in such a vital section of the game for no obvious reason would be an ill-advised waste of time.
            Last edited by rhutson; August 16, 2016, 19:11.

            Comment

            • AnonymousHero
              Veteran
              • Jun 2007
              • 1393

              #21
              Originally posted by rhutson
              As for the 28 bit limitation, that can now be safely and trivially bumped up to a more classic 31 bit limitation. If it is really an issue, email me, and I'll email you back some diffs. But any major rewrite of known working code in such a vital section of the game for no obvious reason would be an ill-advised waste of time.
              Funnily enough, my interest in this was only tangential and coincidental to replacing the (really, really old) RNG in the ToME 2.x codebase. I've replaced it with a PCG family RNG which gives me a standard C++ interface which means i also get randnor() without any 'weird' tables, etc. It's all good from my perspective .

              I mostly wanted the replacement for the same somewhat perverse reasons as yourself, but I also want to reduce the amount of "in-house" code and since PCG is a header-only implementation, it's completely trivial to include it.

              (I haven't yet pushed this to the public GH repo. I should probably start gathering commits to push...)

              Comment

              • rhutson
                Rookie
                • Jul 2013
                • 23

                #22
                Good news everyone!

                I morphed z-rand.c into a Rand_div() speed test and study program. My memory of prior testing and claim that any attempt at optimizing my algorithm any further could not even be measured in a full microseconds have been validated! And I expanded the range and domain of Rand_div into 31 bit territory as well.

                My speed test and the loop test reults on a 2.6 GHz Intel Core i5 compiled with clang -O2 on OS X 10.11.6 :

                Code:
                Newton- 1:39PM-s009 ~/src/angband/src% time ./a.out 2
                1000000000 calls, 1000000000 loop entries, test and branch success rate of  0.00000000%
                ./a.out 2  14.03s user 0.02s system 99% cpu 14.082 total
                Newton- 1:40PM-s009 ~/src/angband/src% printf '< %.2f microseconds per call\n' $((14.03 / 1000000000 * 1000000))
                < 0.01 microseconds per call
                Newton- 1:40PM-s009 ~/src/angband/src% time ./a.out 100                                                         
                1000000000 calls, 1000000024 loop entries, test and branch success rate of  0.00000240%
                ./a.out 100  14.00s user 0.02s system 99% cpu 14.046 total
                Newton- 1:40PM-s009 ~/src/angband/src% time ./a.out 100000000
                1000000000 calls, 1022613980 loop entries, test and branch success rate of  2.26139800%
                ./a.out 100000000  14.50s user 0.02s system 99% cpu 14.550 total
                Newton- 1:42PM-s009 ~/src/angband/src% printf '< %.2f microseconds per call\n' $((14.5 / 1000000000 * 1000000))
                < 0.01 microseconds per call
                Newton- 1:42PM-s009 ~/src/angband/src% time ./a.out 100000001                                                  
                1000000000 calls, 1022608062 loop entries, test and branch success rate of  2.26080620%
                ./a.out 100000001  14.50s user 0.02s system 99% cpu 14.554 total
                Newton- 1:48PM-s009 ~/src/angband/src% time ./a.out 200000001
                1000000000 calls, 1073732253 loop entries, test and branch success rate of  7.37322530%
                ./a.out 200000001  15.69s user 0.03s system 99% cpu 15.747 total
                Newton- 1:49PM-s009 ~/src/angband/src% printf '< %.2f microseconds per call\n' $((15.69 / 1000000000 * 1000000))
                < 0.02 microseconds per call
                Newton- 1:49PM-s009 ~/src/angband/src% time ./a.out 2000000000                                                  
                1000000000 calls, 1073726417 loop entries, test and branch success rate of  7.37264170%
                ./a.out 2000000000  15.52s user 0.02s system 99% cpu 15.570 total
                My program called Rand_div() via the randint0 macro 1 billion times requesting a random number from 0 to n-1 where n is specified via command line. As you can see, the bias issue begins at not being an issue. As n becomes a relatively very large 2 billion, the loop can cycle at least once around 7.4% of the time. For the purposes of the Angband that I played, the unbiased algorithm added no measurable overhead during gameplay. (I can share my test program, but I am probably the only one interested in it.)

                As for the idea of expanding the RNG from 28 bits to 31 bits, that's not my call. I've compiled and am "playing" 4.0.4. (It looks nice!) I'm on the town level, and I should perhaps stay in town for now.

                The edit:

                Actually, thinking about this further, there may be a bug in what my test program is reporting as "test and branch success rate", as that should be a little under 7%, not a little over 7%. Also, there are definitely some efficiency advantages to expanding the range of Rand_div() using that algorithm. I'm not sure if the matter is worth bothering with though. I'll ponder the matter later after I deal with the lawn mower belt.
                Last edited by rhutson; August 17, 2016, 00:31. Reason: test program bug?

                Comment

                • AnonymousHero
                  Veteran
                  • Jun 2007
                  • 1393

                  #23
                  What about "perf" instead of "time"? (Regardless of whether that's better, "time" is definitely not a good way to benchmark anything if you're looking a variation at the sub-second level.)

                  "perf" will at least show you 'cycle counts'.

                  Comment

                  • rhutson
                    Rookie
                    • Jul 2013
                    • 23

                    #24
                    Originally posted by AnonymousHero
                    I mostly wanted the replacement for the same somewhat perverse reasons as yoursel
                    I have developed a "problem" with joking on technical forums. This can cause confusion and create credibility issues. I will try to restrain myself as you and others are interested in learning.

                    The reason that I became foucsed on the RNG was because changes that Ben made to the game slowed it down considerably. I profiled the game and noticed that the plurality (IIRC) of CPU time in the game was being spent in the RNG. Ben did "something" to reduce RNG calls while I focused on improving the efficiency of the RNG. Wanting the best for Angband, I added the algorithm to eliminate the inherent bias of the random() % N method (which was "satisfying").

                    BTW: My last Angband session was over 21 hours nonstop.
                    Last edited by rhutson; August 17, 2016, 05:49.

                    Comment

                    • rhutson
                      Rookie
                      • Jul 2013
                      • 23

                      #25
                      Originally posted by AnonymousHero
                      What about "perf" instead of "time"? (Regardless of whether that's better, "time" is definitely not a good way to benchmark anything if you're looking a variation at the sub-second level.
                      Newton- 9:50PM-s001 ~/src/angband/src% perf
                      zsh: command not found: perf



                      I did find the Wikipedia page for 'perf', but something like that is not needed for what I was/am doing. I did note, e.g., '< 0.02 microseconds per call' as zsh's builtin 'time' reported the CPU usage for the entire program. After that, the math was simple enough. (getrusage() still exists, but it was not needed.)

                      Oh, there was not a bug in my test program. I encountered a printf terminology issue while multitasking, and after my unedited post, I noticed that what I thought should be 6.7% was 7.4%. Needing to commute, I edited and declared a possible bug in my hastily composed test program.

                      With the 31 bit version of the RNG, there is a worst case 6.7% chance of the branch being taken. But if the branch is taken, there is another 6.7% chance of the branch being taken again ... (Unnecessary careful study of earlier posts indicate that I was quite aware of relooping. // again, multitasking)

                      The 28 bit vs 31 bit RNG efficiency study is interesting, but would be more so in later days.

                      Comment

                      • AnonymousHero
                        Veteran
                        • Jun 2007
                        • 1393

                        #26
                        Originally posted by rhutson
                        Newton- 9:50PM-s001 ~/src/angband/src% perf
                        zsh: command not found: perf



                        I did find the Wikipedia page for 'perf', but something like that is not needed for what I was/am doing.
                        My point was merely that 'perf' can tell you exact (well, as exact as possible) numbers for many things you might be interested in. (Like branch prediction and such.)

                        Comment

                        • rhutson
                          Rookie
                          • Jul 2013
                          • 23

                          #27
                          Originally posted by AnonymousHero
                          My point was merely that 'perf' can tell you exact (well, as exact as possible) numbers for many things you might be interested in. (Like branch prediction and such.)
                          I understood your point, but I don't think that you understand what I am doing. 'perf' looks like a tool that I would be using if I was still playing and contributing to Angband while running Linux or even if there was a OS X version. 'perf' does appears to be a good tool for studying the performance of the game itself.

                          Here is stripped down code of the program which I am speed testing with zsh's time (/usr/bin/time yields essentially identical results) :

                          Code:
                          static int loops, calls;
                          
                          // stripped z-rand31.c begins
                          
                                  } else {
                                          ++calls;
                                          /* Use a complex RNG */
                                          while (1) {
                                              ++loops;
                          // stripped z-rand31.c continues and ends
                          
                          // omitted test program stuff
                          
                          int main(int argc, char *argv[])
                          {
                          // omitted test program stuff which includes converting argv[1] to the integer n
                          
                              for (i = 0; i < BIG; ++i)        // I use a do-while loop, but this is more conventional.
                              {
                                  if (randint0(n) >= n)
                                      die("Uh oh");
                              }
                          
                              printf("%d calls, %d loop entries, (loops - calls) / calls * 100: %.8lf\n", calls, loops, (double)(loops - calls) / calls * 100);
                          }
                          The 'time' utility is the right tool for speed testing what I am doing. 'time' is reporting the CPU time of code besides the RNG, but that extra code uses trival CPU time, and it uses a constant amount of time. The constant extra trivial CPU time is what makes the 'time' utility the right tool for optimizing my spare time , and, again, is also why I qualified the results with '<'.

                          I am now reminded of an amusing true story related as dialogue which I hope that you and some others appreciate:

                          "And so, the algorithm runs in `O(n log(n)^2) + 483' time."

                          "Professor, why did you include 483 in your analysis? I thought that we were supposed to removed constants from an analysis."

                          "Oh, you're right." // "+ 483" erased from chalkboard.

                          Addressing any concerns about the accuracy of a modern 'time', I do notice that Apple's unmaintained(?) man pages for /usr/bin/time do add a warning about the 'time' utilty:

                          BUGS
                          The granularity of seconds on microprocessors is crude and can result in times being reported for CPU usage which are too large by a second.

                          BSD June 6, 1993 BSD
                          That probably was true with BSD in 1993, and I do recall 'time' not having much granularity on 4.2 BSD. However, I've never seen such a bug of that magnitude under Linux or OS X (which I primarily switched to in 2008). I am getting results that are consistent enough for my purposes using getrusage() via zsh or /usr/bin/time:

                          Newton- 2:47PM-s001 ~/src/angband/src% time a.out 400
                          500000000 calls, 500000011 loop entries, (loops - calls) / calls * 100: 0.00000220
                          a.out 400 6.85s user 0.01s system 99% cpu 6.886 total
                          Newton- 2:47PM-s001 ~/src/angband/src% time a.out 400
                          500000000 calls, 500000014 loop entries, (loops - calls) / calls * 100: 0.00000280
                          a.out 400 6.86s user 0.01s system 99% cpu 6.890 total
                          Newton- 2:48PM-s001 ~/src/angband/src% time a.out 400
                          500000000 calls, 500000009 loop entries, (loops - calls) / calls * 100: 0.00000180
                          a.out 400 6.86s user 0.01s system 99% cpu 6.890 total
                          Newton- 2:48PM-s001 ~/src/angband/src% time a.out 400
                          500000000 calls, 500000014 loop entries, (loops - calls) / calls * 100: 0.00000280
                          a.out 400 6.94s user 0.02s system 99% cpu 6.983 total
                          Newton- 3:21PM-s001 ~/src/angband/src% time a.out 400
                          500000000 calls, 500000009 loop entries, (loops - calls) / calls * 100: 0.00000180
                          a.out 400 6.86s user 0.01s system 99% cpu 6.891 total
                          Newton- 3:21PM-s001 ~/src/angband/src% time a.out 400
                          500000000 calls, 500000009 loop entries, (loops - calls) / calls * 100: 0.00000180
                          a.out 400 6.86s user 0.02s system 99% cpu 6.901 total
                          Newton- 3:21PM-s001 ~/src/angband/src% time a.out 400
                          500000000 calls, 500000014 loop entries, (loops - calls) / calls * 100: 0.00000280
                          a.out 400 6.85s user 0.02s system 99% cpu 6.892 total
                          I do need to come up with a better name for the "(loops - calls) / calls * 100" metric, but at least it is now not "wrong", but just lacking in meaning. (It's just a test program that I'm working on in my spare time.)

                          But remember, the main idea of the test program was to verify and assure that my unbiased algorithm was not looping a lot. 20 years ago or whenever, I used the "static int calls, loops;" method while testing the algorithm. I added a game command to occasionally print their values while playing the game. I don't recall the specifics, only that I did not observe any alarming looping. The shell speed measurements and numerical demonstration that randint0() has indeed been working correctly all of these years were add-ons.

                          P.S. Obviously, I enjoy discussing this subject with you. And I do need to check the looping performace of the 28-bit RNG.

                          PS2 The reason for the quick P.S. was that I needed to "disconnect", and I think that my post should have begun thusly:

                          "Thank you for sharing that info about 'perf'. However, I abandoned my 16 year Linux project in 2008 for personal reasons. Any advice or help offered by you or anyone else is always welcomed!"
                          Last edited by rhutson; August 18, 2016, 05:00.

                          Comment

                          Working...
                          😀
                          😂
                          🥰
                          😘
                          🤢
                          😎
                          😞
                          😡
                          👍
                          👎