RFC: Reworking sound sub-system

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • calris
    Adept
    • Mar 2016
    • 194

    #31
    Originally posted by AnonymousHero
    Since you mentioned it: On the subject of realloc(), I think mem_realloc() is buggy. I you see [url=https://github.com/angband/angband/blob/b662e8927190fee91d1450fa999c20a205762866/src/z-virt.c#L76]L76: This seems incorrect in that the addition here happens before the NULL check... thus ensuring that the "NULL" check can never be triggered. (It's usually not a problem on Linux due to overcommit, but malloc() can sometimes return NULL even if overcommit is allowed.)

    It also seems to be attempting some sort of "poisoning"/size tracking to protect against corruption, but this is empatically not what one should be doing these days. Just use UBSAN and ASAN. They do this fully automatically and work for all memory accesses, including regular array indexing.
    Just had a closer look at mem_realloc() and my immediate thought is WTF!

    Why on earth are we attempting to realloc starting NOT the pointer to the memory we are reallocating?

    And seriously, why do we have z-virt.c - I saw something in trac about nuking them. Seriously, custom wrappers around standard C functions are evil as.

    Comment

    • AnonymousHero
      Veteran
      • Jun 2007
      • 1393

      #32
      Originally posted by calris
      Ah, now I get it - Hash the string into, say, a u32 and just shove it into an array (which we can realloc be some fixed chunk size) and forget about sorting, just do a linear search. So all we need is either
      • A hash algorithm which guarantees no duplicate keys for non-equal strings, or
      • Some way to manage potential duplicate keys.


      I'm open to any ideas you have to offer.
      You don't even need any of that. Even if you get a few hash coollisions, the only consequence is that you'll occasionally be checking using strcmp(). That doesn't really matter.

      (If you really want to avoid duplicates, you could use the quark_* functions instead, but they seem to be more optimized for the case where you're *usually* adding an unknown string, so YMMV.)

      Wrt. choice of hash function: You can get arbitrarily advanced here, but FNV seems to be popular based on their list of users. Personally, I'd just go with the dead-simple Java String#hashCode() implementation (the formula is in the documentation). It's not great by any means (particularly if you have lots of non-ASCII characters, apparently), but until there's evidence that something better is needed, I wouldn't bother doing anything more advanced.

      Comment

      • AnonymousHero
        Veteran
        • Jun 2007
        • 1393

        #33
        Originally posted by calris
        Just had a closer look at mem_realloc() and my immediate thought is WTF!

        Why on earth are we attempting to realloc starting NOT the pointer to the memory we are reallocating?
        That was my thought as well, but after pondering it for a bit I came to the conclusion that it must be because it's reserving extra space for size tracking/poisoning.

        Originally posted by calris
        And seriously, why do we have z-virt.c - I saw something in trac about nuking them. Seriously, custom wrappers around standard C functions are evil as.
        Yeah, it's kind of silly, though it does provide a more convenient interface if you really just want to crash on OOM instead of having to sprinkle huge amounts of "if != null" checks all over the place.

        ("Real" realloc() also has an extremely error-prone interface which it looks like mem_realloc() mitigates at least somewhat.)
        Last edited by AnonymousHero; March 26, 2016, 10:33.

        Comment

        • calris
          Adept
          • Mar 2016
          • 194

          #34
          Originally posted by AnonymousHero
          You don't even need any of that. Even if you get a few hash coollisions, the only consequence is that you'll occasionally be checking using strcmp(). That doesn't really matter.
          OK, I'll admit, I don't get it... I've never really used hashing...

          I thought that the point of a hash was to map a complex key (like a string) into a simple numerical key which you can store in a linear array and just do a basic linear search.

          But if you map a string to a number, how will you know you have a collision?

          From what I can think, you hash the string, lookup the key, then check the string using strcmp() to test for a collision. But then what?

          Comment

          • calris
            Adept
            • Mar 2016
            • 194

            #35
            Originally posted by calris
            I thought that the point of a hash was to map a complex key (like a string) into a simple numerical key which you can store in a linear array and just do a basic linear search.
            OK, I've done some research

            The hash function maps the string into an index into a hash table (array) of some predefined size. The array needs to hold lists (linked lists, or sub-arrays than we can resize) of structures which contain the string key and the data (in this use case it's simply a 'sound id').

            I'm still not getting how this is not going to result in the hash table and it's associated lists being splattered all over memory, putting us right back to our previous cache-miss problem. I will freely admit that a good hash function with minimal collisions will result in less strcmp() operations that the tree.

            Comment

            • Pete Mack
              Prophet
              • Apr 2007
              • 6883

              #36
              Why is anyone worrying about cache misses in Angband???? Yeah, it's important in other situations, but not in a text-based adventure game for crying out loud.

              Comment

              • calris
                Adept
                • Mar 2016
                • 194

                #37
                Originally posted by AnonymousHero
                Yeah, it's kind of silly, though it does provide a more convenient interface if you really just want to crash on OOM instead of having to sprinkle huge amounts of "if != null" checks all over the place.
                Good point

                ("Real" realloc() also has an extremely error-prone interface which it looks like mem_realloc() mitigates at least somewhat.)
                How so? The only thing I can think of is that the results of passing a size of zero is implementation defined. z-term.c returns NULL on size == 0 (which is nice, since no matter WHAT compiler we use, or what system we run on, we know the behaviour)

                I'm now thinking this alone is worth keeping z-virt.c around for

                Comment

                • calris
                  Adept
                  • Mar 2016
                  • 194

                  #38
                  Originally posted by Pete Mack
                  Why is anyone worrying about cache misses in Angband???? Yeah, it's important in other situations, but not in a text-based adventure game for crying out loud.
                  Because some of us a purists

                  Seriously, I don't really care what is used - I just needed a data structure that was effective and reasonably efficient at converting sound name strings to more usable sound id's. I'm familiar with AVL trees, I've used them before, I'm comfortable using them. I haven't used hash tables before, so I never gave them a thought. But Angband is a community development effort, so I will happily bow to the community consensus.

                  Comment

                  • AnonymousHero
                    Veteran
                    • Jun 2007
                    • 1393

                    #39
                    Originally posted by Pete Mack
                    Why is anyone worrying about cache misses in Angband???? Yeah, it's important in other situations, but not in a text-based adventure game for crying out loud.
                    I'm don't particularly care very much about the performance aspects; I do care about the simplicity. (I just thought I'd explain about about the performance aspects.)

                    That said, you usually do want to choose a performant data structure when it doesn't incur (code) complexity (as in this case)... lest you actually do end up having to optimize and finding that you have a very flat profile.

                    Comment

                    • AnonymousHero
                      Veteran
                      • Jun 2007
                      • 1393

                      #40
                      Originally posted by calris
                      OK, I'll admit, I don't get it... I've never really used hashing...

                      I thought that the point of a hash was to map a complex key (like a string) into a simple numerical key which you can store in a linear array and just do a basic linear search.

                      But if you map a string to a number, how will you know you have a collision?

                      From what I can think, you hash the string, lookup the key, then check the string using strcmp() to test for a collision. But then what?
                      Collisions don't matter in this case since we wouldn't use the hash key to *index* into the array (as you would in e.g. a hash table). In that case you need to care about collisions. What I'm suggesting isn't really a hash table. It's just storing an extra "int" in the struct as a simple way to optimize the strcmp() away in most cases.

                      Comment

                      • AnonymousHero
                        Veteran
                        • Jun 2007
                        • 1393

                        #41
                        Originally posted by calris
                        How so? The only thing I can think of is that the results of passing a size of zero is implementation defined. z-term.c returns NULL on size == 0 (which is nice, since no matter WHAT compiler we use, or what system we run on, we know the behaviour)

                        I'm now thinking this alone is worth keeping z-virt.c around for
                        Almost nobody handles the return value of realloc() properly, esp. in the failure case. (Nor the size==0 case.)

                        It used to be a favorite sport of mine to look for bad uses of realloc() in code bases I happened to work on . Almost nobody got it right.

                        Comment

                        • calris
                          Adept
                          • Mar 2016
                          • 194

                          #42
                          Originally posted by AnonymousHero
                          Collisions don't matter in this case since we wouldn't use the hash key to *index* into the array (as you would in e.g. a hash table). In that case you need to care about collisions. What I'm suggesting isn't really a hash table. It's just storing an extra "int" in the struct as a simple way to optimize the strcmp() away in most cases.
                          Ah, I think I get it now...

                          Change:
                          Code:
                          struct sound_data {
                          	char *name;
                          	bool loaded;
                          	void *plat_data;
                          };
                          to

                          Code:
                          struct sound_data {
                          	char *name;
                          	u16b hash;
                          	bool loaded;
                          	void *plat_data;
                          };
                          And have sounds_list[] be unsorted.

                          When looking for a sound, hash it's name, scan sounds_list[] until we find a a match on hash, do a strcmp() to check, and if strcmp() returns non zero, keep scanning. If we don't find a match, just add a new entry into sounds_list[].

                          sound_list[] will be sorted by id (which is what we need during runtime) and will be reasonably efficient for sound name lookups because the number of elements will be small.

                          Elegant - I like this a lot

                          Comment

                          • calris
                            Adept
                            • Mar 2016
                            • 194

                            #43
                            Originally posted by calris
                            Just had a closer look at mem_realloc() and my immediate thought is WTF!

                            Why on earth are we attempting to realloc starting NOT the pointer to the memory we are reallocating?

                            And seriously, why do we have z-virt.c - I saw something in trac about nuking them. Seriously, custom wrappers around standard C functions are evil as.

                            Originally posted by AnonymousHero
                            That was my thought as well, but after pondering it for a bit I came to the conclusion that it must be because it's reserving extra space for size tracking/poisoning.
                            I would have thought this would be a seriously risky proposition. From my limited understanding of allocators, I thought they tracked chunks and sizes - what would happen if the pointer was at the start of a chunk and subtracting one would push it back into another chunk?

                            I'm looking at that code and I'm seriously worried by it's safety - surely how realloc() behaves when given a pointer that points to a location not returned by any of the alloc() functions is platform specific and, ergo, defies the whole purpose of wrapping it in the first place.

                            What if we performed this:

                            Code:
                            	char *fred;
                            	char *jane;
                            
                            	fred = mem_alloc(1);
                            	jane = mem_alloc(1);
                            
                            	jane = mem_realloc(jane, 5);
                            Might we run the risk of re-allocating fred instead of jane?

                            Comment

                            • AnonymousHero
                              Veteran
                              • Jun 2007
                              • 1393

                              #44
                              Originally posted by calris
                              Ah, I think I get it now...

                              (snip)
                              When looking for a sound, hash it's name, scan sounds_list[] until we find a a match on hash, do a strcmp() to check, and if strcmp() returns non zero, keep scanning. If we don't find a match, just add a new entry into sounds_list[].

                              sound_list[] will be sorted by id (which is what we need during runtime) and will be reasonably efficient for sound name lookups because the number of elements will be small.

                              Elegant - I like this a lot
                              Exactamundo!

                              Comment

                              • calris
                                Adept
                                • Mar 2016
                                • 194

                                #45
                                Originally posted by AnonymousHero
                                Exactamundo!
                                I have learned something new today. For the next ten minutes, my education is complete

                                Comment

                                Working...
                                😀
                                😂
                                🥰
                                😘
                                🤢
                                😎
                                😞
                                😡
                                👍
                                👎