Some timings
I've got some numbers - not necessarily very helpful ones, but it's hard to find a repeatable test that is also representative of real, unpredictable behavior.
Unfortunately I was unable to get a working build of 4.1.3 to compare it with, but the take-away is that (if you can accept the diving sim as realistic, at least) Angband benefits by an unusual amount from compiler flags in general and LTO in particular - and by default on Linux you get -O0. My first impression of 4.2.1 was that it was slow (when compared to Frog - I hadn't played 4.1 recently), but it's more than fast enough with the LTO build.
As to why this is - there are a lot of small functions that are called a lot from other files. The flag_has_dbg(), square() and loc() functions are among the worst offenders here - I don't know exactly how much time they take (profiling adds too much overhead to function calls to say), but I do know that with -O3 -flto and profiling around 70% of the time is spent in the two top level functions update_view() and process_world(), while profiling -O3 (without LTO) shows a similar small-function heavy profile to -O0.
These were built with gcc 10.2 for Linux and used the X11 (ascii) interface.
I've got some numbers - not necessarily very helpful ones, but it's hard to find a repeatable test that is also representative of real, unpredictable behavior.
Code:
[FONT=Courier New][B]Xygos (branched a few months ago from 4.2.1)[/B][/FONT] [FONT=Courier New]O0 O2 O3 native O3 lto native Startup/load/save/exit 0.361 0.229 0.270 0.245 " + rest until hungry 1.343 0.670 0.645 0.445 50 diving sims/level 32.514 13.827 11.000 6.665[/FONT] [FONT=Fixedsys][FONT=Courier New] [B]Today's V[/B] Startup/load/save/exit 0.277 - - 0.189 " + rest until hungry 0.669 - - 0.261 [/FONT][/FONT][FONT=Fixedsys][FONT=Courier New] 50 diving sims/level 35.131 - - 7.847[/FONT][/FONT]
As to why this is - there are a lot of small functions that are called a lot from other files. The flag_has_dbg(), square() and loc() functions are among the worst offenders here - I don't know exactly how much time they take (profiling adds too much overhead to function calls to say), but I do know that with -O3 -flto and profiling around 70% of the time is spent in the two top level functions update_view() and process_world(), while profiling -O3 (without LTO) shows a similar small-function heavy profile to -O0.
These were built with gcc 10.2 for Linux and used the X11 (ascii) interface.
Comment