RFC: Reworking sound sub-system

**Nick** · March 25, 2016, 22:36

After considerable testing (mainly of my Linux system rather than your code, thank you pulseaudio) I can report the following:

I can hear sounds in Linux!
You need to add sound.prf to lib/customize/Makefile
My sound did not work with mp3s, I needed ogg - which I suspect will be the case for many people (but not those using Ubuntu, it seems)
I made .ogg versions of Dubtrain's sounds and manually copied them to my sound install directory - we need to decide what we're doing about that
I get the sounds just from running angband - no need for any commandline switch; I did use the --enable-sdl-mixer switch to configure
I haven't really come to terms with what the balanced trees are doing, but I'll look at that now

**calris** · March 26, 2016, 01:09

Yes, sound can be a PITA in Linux depending on the distribution - mostly because each sound library needs it's out additional packages to support various formats

I think we need sounds in another git repo and have all the different formats we support in that repo (otherwise the main repo gets too big to to download for someone wanting to do a quick code hack when the probably already have the sound files in their angband install anyway). Install maintainers can still build installers with the sound files included so the user does not have to download an additional 'sound pack'

The AVL tree code I shamelessly stole from here:

Index of /software/tree/tree-1.0

http://piumarta.com/software/tree/tree-1.0/

I haven't dug into the code much either. I have implemented AVL tress in C++ before, so I have a good understanding of their mechanics. Basically, all operations (insert, find, delete) are O(LogN). Anywhere we currently used linked-list to store large amounts of data (especially when we are constantly adding and removing data) with a single search key can be replaced by a tree - they are much faster.

**takkaria** · March 26, 2016, 02:20

Originally posted by calris

Yes, sound can be a PITA in Linux depending on the distribution - mostly because each sound library needs it's out additional packages to support various formats

I think we need sounds in another git repo and have all the different formats we support in that repo (otherwise the main repo gets too big to to download for someone wanting to do a quick code hack when the probably already have the sound files in their angband install anyway). Install maintainers can still build installers with the sound files included so the user does not have to download an additional 'sound pack'

Looks like the best way to do this would be using the 'BFG': https://rtyley.github.io/bfg-repo-cleaner/ Since doing this will mess up any merges with code from before deletion, a good time to do this would be after all pull requests/active branches are merged.

I think if we put all the sounds in a separate repo we should also put the sound config in the lib/sounds directory too so that all sound-related stuff is in one place and we don't have dangling links. If we update the sounds we don't want to have to update two repositories.

Maybe if the plan is to alter git history to optimise for file storage, we could also make sure that all the PNG files in the repository are fully optimised and maybe even get rid of the BMP files in lib/xtra/graf from the pre-3.3 era. Just the 32x32 Gervais tileset in BMP form is bigger than all the mp3s combined.

**AnonymousHero** · March 26, 2016, 03:26

Originally posted by takkaria

Looks like the best way to do this would be using the 'BFG': https://rtyley.github.io/bfg-repo-cleaner/ Since doing this will mess up any merges with code from before deletion, a good time to do this would be after all pull requests/active branches are merged.

For the love of all that's holy, don't do that. It's hugely disruptive to anyone who currently has a clone somewhere and you have absolutely no idea how many people do. You should only even be considering rewriting (published) history in extreme scenarios such as: Someone accidentally published the nuclear launch codes (and the codes themselves cannot easily be changed), or there's some kind of legal threat where removing files at the "master" revision is not acceptable to the other party.

Just use a separate repository + the "submodule" functionality of git. That would mean that fetching the sounds is a simple "git submodule init && git submodule update" command away.

(Btw, I'm not sure if you're already familiar with it, but the tool is probably named after the BFG 9000 from Doom. That should tell you something about when it's appropriate to use it.)

**calris** · March 26, 2016, 03:44

Good point. The horse has already bolted for mp3 files - unless we practically destroy the entire history of the repo it's always going to be a huge repo to clone. But at least we can stop it getting even more out of hand.

**takkaria** · March 26, 2016, 03:44

Originally posted by AnonymousHero

For the love of all that's holy, don't do that. It's hugely disruptive to anyone who currently has a clone somewhere and you have absolutely no idea how many people do. You should only even be considering rewriting (published) history in extreme scenarios such as: Someone accidentally published the nuclear launch codes (and the codes themselves cannot easily be changed), or there's some kind of legal threat where removing files at the "master" revision is not acceptable to the other party.

Just use a separate repository + the "submodule" functionality of git. That would mean that fetching the sounds is a simple "git submodule init && git submodule update" command away.

Thanks for this, I had a vague intuition it would be a bad idea but it's nice to have it laid out. The thing is that moving the mp3s into a submodule won't remove them when cloning the repository, which means that removing them is sort of pointless. They might as well stay there with a separate download of oggs for Linux users.

**Nick** · March 26, 2016, 03:52

Originally posted by AnonymousHero

(Btw, I'm not sure if you're already familiar with it, but the tool is probably named after the BFG 9000 from Doom. That should tell you something about when it's appropriate to use it.)

IIRC that was always, as soon as you acquired it

**AnonymousHero** · March 26, 2016, 03:53

Originally posted by calris

The AVL tree code I shamelessly stole from here:

Index of /software/tree/tree-1.0

http://piumarta.com/software/tree/tree-1.0/

(EDIT: Nevermind, seems to be a BSD/MITish license, but it really needs to be checked.)

Originally posted by calris

Basically, all operations (insert, find, delete) are O(LogN). Anywhere we currently used linked-list to store large amounts of data (especially when we are constantly adding and removing data) with a single search key can be replaced by a tree - they are much faster.

Firstly, you almost never want to use linked lists. Not even if you don't care about performance. (There are plausible use cases for so-called "intrusive" linked lists, but that's an aside.) EDIT: I realize you didn't advocate for linked lists, just pointing it out for those reading along

.

Secondly, "lower algorithmic complexity" does not automatically translate to "faster". (Thus, it's important to not conflate the two!). Constant factors of course also matter a lot, but even more importantly: The tradeoff between RAM latency vs. throughput on modern CPUs means that any kind of pointer-chasing is penalized to an extreme degree. (There are so-called cache-oblivious algorithms/data structures which can alleviate many of these things, but often at a cost of greatly increased implementation complexity.)

Thirdly, while you're right about the algorithmic complexity of (AVL) trees, this is not the end of the story.
Unless you have huge[0] numbers of items and/or truly need the range support of trees, you're almost always
better of with a simple flat array of as-flat-as-possible structs. For searching simply do a linear search[1]. The reason this is almost always better is simple: CPUs these days a) have huge caches, and b) do prefetching like you wouldn't believe. (Those two things mean that any kind of intensive pointer-hopping -- like that required by trees -- incurs a huge latency penalty wrt. accessing RAM.)

I believe the scenario you're describing here is: a) data is only loaded/inserted once, and b) you're looking at <= 10000 entries (so "not huge"), and c) you don't need huge amounts of in-line data for each entry.

I would recommend something like the following:

Code:

   struct entry {
       uint32_t key_hash;   // Some simple hash of the string in "key"
       const char *key;   
       // ...
       uint8_t *sound_data;    // I'm assuming loaded sound data would go here
   }

Load all your data into a simple array of "struct entry". When you need to find something simply hash your lookup key, do a linear search for a matching "key_hash", making sure the real key actually matches (otherwise, skip to next).

I'm basing the use of "key_hash" on the assumption that you want to look things up by name and the fact that you'd need to do pointer-chasing to compare anything against "key" (plus strcmp is linear-time, so you want to avoid it in an inner loop). If that's not the case and you have some simple constant-size key, just use the key directly.

Since the sound data is huge, it doesn't matter that we access that through a pointer indirection. (Heck, you could probably load it from file every time you need it.)

Fourthly, Most non-trivial tree algorithms are much more complicated implementation-wise than a simple linear search. Unless you have a good test harness (e.g. QuickCheck or something like that), you're likely to have bugs in such an implementation.

[0] Obviously what's "huge" varies and you'd really need benchmarks to pinpoint the exact number in this exact case. Plus it may depend on cache sizes, etc. However, a good ball-park estimate might be anything from 10000-1000000 depending on the size of the "struct" you need to store.

[1] Even sorting the array and doing a binary search would likely be slower than linear search simply due to the pointer-chasing that this incurs.

EDIT: I should say... none of this probably matters very much on modern machines for the number of elements we're talking about (and because lookups would be so rare, comparatively speaking), so I'd say the most important thing here is: Do the simplest thing possible (i.e. avoid bugs)... which coincidentally is also an array

.

**AnonymousHero** · March 26, 2016, 03:54

Originally posted by Nick

IIRC that was always, as soon as you acquired it

Heh! Are you channelling debo, right now?

(I guess when you're alone against the world, the collateral damage doesn't matter that much, but when you do care who else gets hit... watch out!)

**AnonymousHero** · March 26, 2016, 03:59

Originally posted by takkaria

Thanks for this, I had a vague intuition it would be a bad idea but it's nice to have it laid out. The thing is that moving the mp3s into a submodule won't remove them when cloning the repository, which means that removing them is sort of pointless. They might as well stay there with a separate download of oggs for Linux users.

Right, but removing/moving them in the main repo would mean that you could clone with "--max-depth=1" and avoid downloading them. (Plus, it would mean that further revisions don't take up space in the main repository.)

I would think that's enough motiation to do it even if it would mean that people actively working on sound (few and far between it seems) would effectively have to download them twice[1].

[1] Well, they'd have to download all revisions pre-move (unless using --max-depth) (x1), the exact revisions that were moved (x2), and all revisions post-move (x1).

**takkaria** · March 26, 2016, 05:09

Originally posted by AnonymousHero

Right, but removing/moving them in the main repo would mean that you could clone with "--max-depth=1" and avoid downloading them. (Plus, it would mean that further revisions don't take up space in the main repository.)

I would think that's enough motiation to do it even if it would mean that people actively working on sound (few and far between it seems) would effectively have to download them twice[1].

[1] Well, they'd have to download all revisions pre-move (unless using --max-depth) (x1), the exact revisions that were moved (x2), and all revisions post-move (x1).

Aaah. I didn't know about the depth setting. That makes tons more sense now, thanks.

**calris** · March 26, 2016, 05:30

Originally posted by AnonymousHero

(Assuming you linked to the correct place and unless you wrote that code yourself: ) DANGER! That code does not appear to have any kind of license and that's hugely problematic.

It does have a license, and it is as permissive as they come:

Code:

/* Copyright (c) 2005 Ian Piumarta
 * 
 * All rights reserved.
 * 
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the 'Software'), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, and/or sell copies of the
 * Software, and to permit persons to whom the Software is furnished to do so,
 * provided that the above copyright notice(s) and this permission notice appear
 * in all copies of the Software and that both the above copyright notice(s) and
 * this permission notice appear in supporting documentation.
 *
 * THE SOFTWARE IS PROVIDED 'AS IS'.  USE ENTIRELY AT YOUR OWN RISK.
 */

Secondly, "lower algorithmic complexity" does not automatically translate to "faster". (Thus, it's important to not conflate the two!). Constant factors of course also matter a lot, but even more importantly: The tradeoff between RAM latency vs. throughput on modern CPUs means that any kind of pointer-chasing is penalized to an extreme degree. (There are so-called cache-oblivious algorithms/data structures which can alleviate many of these things, but often at a cost of greatly increased implementation complexity.)

Thirdly, while you're right about the algorithmic complexity of (AVL) trees, this is not the end of the story.
Unless you have huge[0] numbers of items and/or truly need the range support of trees, you're almost always
better of with a simple flat array of as-flat-as-possible structs. For searching simply do a linear search[1]. The reason this is almost always better is simple: CPUs these days a) have huge caches, and b) do prefetching like you wouldn't believe. (Those two things mean that any kind of intensive pointer-hopping -- like that required by trees -- incurs a huge latency penalty wrt. accessing RAM.)

Yep - well aware that reducing algorithmic (Big-O) complexity isn't the be all and end all. You will notice that I did use a fairly simple array that I use realloc() to append fixed-size chunks to to map sound id's to sound data. I _could_ have used a tree or a list, but the array is by far the best option beacuse:
a) Elements are never removed until shutdown
b) Element id's increment linearly from 0 as new sounds are identified

I believe the scenario you're describing here is: a) data is only loaded/inserted once, and b) you're looking at <= 10000 entries (so "not huge"), and c) you don't need huge amounts of in-line data for each entry.

I would recommend something like the following:

Code:

   struct entry {
       uint32_t key_hash;   // Some simple hash of the string in "key"
       const char *key;   
       // ...
       uint8_t *sound_data;    // I'm assuming loaded sound data would go here
   }

Load all your data into a simple array of "struct entry". When you need to find something simply hash your lookup key, do a linear search for a matching "key_hash", making sure the real key actually matches (otherwise, skip to next).

I'm basing the use of "key_hash" on the assumption that you want to look things up by name and the fact that you'd need to do pointer-chasing to compare anything against "key" (plus strcmp is linear-time, so you want to avoid it in an inner loop). If that's not the case and you have some simple constant-size key, just use the key directly.

The issues are:

Sounds are defined by a string name that we have to keep referring to while we are processing the sounds preferences. Every time we process a sound entry in the pref file, we need to match it to an id (or create a new id for the sound). The sound id is what is used to store which sounds are used for given messages (i.e. we do not store the 'name' of the sound, otherwise we would end up with a ton of allocated strings to hold the names.
We don't come across sound names in the pref files in alphabetical order, so if we store them in an array, we will be doing a lot of linear searches using strcmp()

Since the sound data is huge, it doesn't matter that we access that through a pointer indirection. (Heck, you could probably load it from file every time you need it.)

The sound data itself is stored in a simple array and referenced by 'sound id'. Remember, the 'sounds tree' maps each sound's textual name to it's associated id. It is only hit when we process a pref file. Remember, we can process a pref file at ANY time during game play - it's not a one-off thing. We need the sound name/id map to stick around in case with mess with the message/sound map.

Fourthly, Most non-trivial tree algorithms are much more complicated implementation-wise than a simple linear search. Unless you have a good test harness (e.g. QuickCheck or something like that), you're likely to have bugs in such an implementation.

Hence why I 'stole' the code

[0] Obviously what's "huge" varies and you'd really need benchmarks to pinpoint the exact number in this exact case. Plus it may depend on cache sizes, etc. However, a good ball-park estimate might be anything from 10000-1000000 depending on the size of the "struct" you need to store.

It's more than just how much data you're storing - it's also about how often you need to perform operations on the data (find, insert, delete)

[1] Even sorting the array and doing a binary search would likely be slower than linear search simply due to the pointer-chasing that this incurs.

Don't forget that we are dealing with arbitrary, randomly ordered, variable length, string keys here. So if we use a binary search on an array of struct containing the key/data pair, the keys are going to be splattered all over memory anyway because of the extra level of indirection need for the char *sound_name

EDIT: I should say... none of this probably matters very much on modern machines for the number of elements we're talking about (and because lookups would be so rare, comparatively speaking), so I'd say the most important thing here is: Do the simplest thing possible (i.e. avoid bugs)... which coincidentally is also an array

.

The message<->sound mapping used to be in an array of size [MAX_MESSAGES][16] (i.e. a maximum of 16 randomised sounds per message). This is kind of OK if we consider that MOST messages had a sound allocated to it. It still hasted a lot of memory, but it wasn't THAT bad considering the reduced code complexity.

But now Nick is thinking forward to the possibility of moving message strings into a data file - the number of messages if going to grow to about 350 or so. That's going to be something like 20k of MOSTLY unused memory for the message/sound map.

NOTE: I have a background in embedded development - for me, every byte counts. I'm more than happy to ditch the message<->sound tree in favour of a fixed 20kB array, but I'll need to be convinced on the sound name<->sound id map.

**AnonymousHero** · March 26, 2016, 08:43

(Oh, great -- forum doesn't nest quotes properly when replying. Well, I hope get this right...)

Originally posted by calris

It does have a license, and it is as permissive as they come

Yup, noted in an edit of my original. I guess you might have been replying while I was editing

.

Is that wording standard MIT license? The important bit here is that it's actually GPL-compatible and known to be.

Originally posted by calris

The issues are:

Sounds are defined by a string name that we have to keep referring to while we are processing the sounds preferences. Every time we process a sound entry in the pref file, we need to match it to an id (or create a new id for the sound). The sound id is what is used to store which sounds are used for given messages (i.e. we do not store the 'name' of the sound, otherwise we would end up with a ton of allocated strings to hold the names.
We don't come across sound names in the pref files in alphabetical order, so if we store them in an array, we will be doing a lot of linear searches using strcmp()

Use a hash-of-the-string for a quick rejection check when searching -- that's constant time and you don't incur the pointer indirection forced by strcmp() unless you need to. (Which I noted in my message.)

Originally posted by calris

Remember, we can process a pref file at ANY time during game play - it's not a one-off thing. We need the sound name/id map to stick around in case with mess with the message/sound map.

So you need to possibly add entries... so you might need to resize the array, but other than that I don't really how this affects the use of an array. (See below.)

Originally posted by calris

Hence why I 'stole' the code

Indeed. (And that's a good, thing, don't get me wrong.)

Originally posted by calris

It's more than just how much data you're storing - it's also about how often you need to perform operations on the data (find, insert, delete)

Indeed, but my point was that all of those operations are faster on an array than on a tree -- assuming a non-huge number of elements. ('Insert' needs to resize sometimes, but that can be amortized by using exponentially increasing increments, and 'delete' just needs to mark elements as deleted rather than moving elements around. Obviously 'find' and 'insert' need to account for that marker.)

Aside: If you want a tree which is more cache friendly, etc. you'll probably want something more like a B+ tree -- or a cache-oblivious tree. (Yes, B+ trees were originally for use with disks, but these days you CPU cache hierarchy has very similar properties to disk I/O. The latencies are obviously lower overall, but there's an order-of-magnitude thing going on, and prefetching means that the larger nodes are basically free.)

Originally posted by calris

Don't forget that we are dealing with arbitrary, randomly ordered, variable length, string keys here.

Doesn't matter -- if you use a hash of the key in each entry for a quick rejection check, as suggested.

Originally posted by calris

So if we use a binary search on an array of struct containing the key/data pair, the keys are going to be splattered all over memory anyway because of the extra level of indirection need for the char *sound_name

You shouldn't use a binary search on the array! (Which I believe I also mentioned, but it may have been in an edit.)

The point is to avoid jumping around in memory and a binary search means that prefetching goes out the window. (There are also some surprising cache-related subtleties around binary search, btw.)

The throughput of the CPU-RAM (given prefetching) is such that a linear search is faster. (Again, assuming non-huge numbers of items.)

Originally posted by calris

But now Nick is thinking forward to the possibility of moving message strings into a data file - the number of messages if going to grow to about 350 or so. That's going to be something like 20k of MOSTLY unused memory for the message/sound map.

NOTE: I have a background in embedded development - for me, every byte counts. I'm more than happy to ditch the message<->sound tree in favour of a fixed 20kB array, but I'll need to be convinced on the sound name<->sound id map.

Your phone probably has >1G. End of story regarding memory usage, AFAICS

.

(I don't mean to be disparaging. You'd definitely be right if we were talking "real"[1] embedded, but we aren't.)

[1] As opposed to the "faux" embedded I used to do where "embedded" really just meant a Linux running on a machine with about a third of what a regular PC of the time had... and stuffing said (small form-factor) machine into some enclosure along with a few other bits of peripheral hardware.

**AnonymousHero** · March 26, 2016, 09:02

Since you mentioned it: On the subject of realloc(), I think mem_realloc() is buggy. I you see L76: This seems incorrect in that the addition here happens before the NULL check... thus ensuring that the "NULL" case can never be triggered. (It's usually not a problem on Linux due to overcommit, but malloc() can sometimes return NULL even if overcommit is allowed.)

It also seems to be attempting some sort of "poisoning"/size tracking to protect against corruption, but this is empatically not what one should be doing these days. Just use UBSAN and ASAN. They do this fully automatically and work for all memory accesses, including regular array indexing.

EDIT: Fixed URL

**calris** · March 26, 2016, 09:04

Originally posted by AnonymousHero

Use a hash-of-the-string for a quick rejection check when searching -- that's constant time and you don't incur the pointer indirection forced by strcmp() unless you need to. (Which I noted in my message.)

Ah, now I get it - Hash the string into, say, a u32 and just shove it into an array (which we can realloc be some fixed chunk size) and forget about sorting, just do a linear search. So all we need is either

A hash algorithm which guarantees no duplicate keys for non-equal strings, or
Some way to manage potential duplicate keys.

I'm open to any ideas you have to offer.

I can immediately replace the message<->sound id tree with a fixed size array - It makes my skin crawl to waste the memory, but performance is king here

RFC: Reworking sound sub-system

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment