Angband 3.5 vs 4.0 performance

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • fizzix
    Prophet
    • Aug 2009
    • 3025

    Angband 3.5 vs 4.0 performance

    So, 4.0 is slower than 3.5. For the most part this is not terribly relevant to playing experience. Both are fast enough to play. However there are two situations where the lag is noticeable. The first is when resting for a long number of turns. The second is if you try to run statistics.

    To try to figure out what's going on, I got profiles up and running and ran a couple tests to try and figure out where the lags are. First we'll look at the top offenders for the statistics. In both cases I generated 50 complete runs through 1-99 and gathered the stats. Here's 4.0

    Code:
      %   cumulative   self              self     total           
     time   seconds   seconds    calls  ms/call  ms/call  name    
      9.89     21.56    21.56 783537519     0.00     0.00  flag_has_dbg
      9.30     41.83    20.27                             _find_in_range
      8.36     60.06    18.23 17407127     0.00     0.00  lookup_kind
      7.67     76.79    16.73                             join_region
      5.90     89.66    12.87 526606789     0.00     0.00  flag_off
      4.97    100.49    10.83 117488390     0.00     0.00  Rand_div
      4.90    111.18    10.69 305731848     0.00     0.00  i_to_yx
      3.77    119.40     8.22 94656879     0.00     0.00  square_set_feat
      3.12    126.21     6.81   266591     0.03     0.06  get_mon_num
      2.48    131.62     5.41                             ignore_point
      2.24    136.51     4.89                             build_colors
      2.01    140.90     4.39 103697301     0.00     0.00  square_ispassable
      1.95    145.15     4.25 94931265     0.00     0.00  q_push
      1.94    149.38     4.23 490669419     0.00     0.00  yx_to_i
      1.92    153.57     4.19 93471309     0.00     0.00  square_isfloor
      1.63    157.12     3.55   335260     0.01     0.08  apply_magic
      1.48    160.35     3.23 82383120     0.00     0.00  q_pop
      1.47    163.55     3.20 132785972     0.00     0.00  mem_alloc
      1.36    166.51     2.96 174276693     0.00     0.00  cave_generate
    Here's 3.5

    Code:
      %   cumulative   self              self     total           
     time   seconds   seconds    calls  ms/call  ms/call  name    
     13.51      7.80     7.80 497556223     0.00     0.00  flag_has_dbg
     12.37     14.94     7.14 103167463     0.00     0.00  Rand_div
     11.04     21.31     6.37                             ignore_point
      9.70     26.91     5.60                             join_regions
      6.46     30.64     3.73   268057     0.01     0.04  get_mon_num
      5.56     33.85     3.21  6493759     0.00     0.00  create_mask
      3.00     35.58     1.73 59482400     0.00     0.00  q_pop
      2.96     37.29     1.71 103416854     0.00     0.00  cave_set_feat
      2.95     38.99     1.70                             default_gen
      2.72     40.56     1.57 104298408     0.00     0.00  cave_ispassable
      2.70     42.12     1.56 65076388     0.00     0.00  q_push
      2.30     43.45     1.33     5022     0.26     0.40  cave_generate
      1.87     44.53     1.08 104371647     0.00     0.00  cave_isvault
      1.78     45.56     1.03                             cavern_gen
      1.68     46.53     0.97                             _find_in_range
      1.40     47.34     0.81 89942052     0.00     0.00  cave_isfloor
      1.35     48.12     0.78   334691     0.00     0.01  apply_magic
      1.06     48.73     0.61 76212215     0.00     0.00  cave_isdoor
      0.90     49.25     0.52                             get_mon_race_aux
    From this already it should be clear that 3.5 is faster. It turns out that it's actually about 4 times faster (57 seconds to 218). Not disastrous, but it makes stat gathering and balancing a bit more difficult.

    Here's the results for resting. In both cases I went down one flight and rested 1000 turns. Here's 4.0

    Code:
     time   seconds   seconds    calls  ms/call  ms/call  name    
     24.07      1.36     1.36 138774747     0.00     0.00  flag_has_dbg
     15.22      2.22     0.86 13928040     0.00     0.00  grid_data_as_text
     14.87      3.06     0.84 13264329     0.00     0.00  map_info
     11.50      3.71     0.65     1014     0.64     5.16  display_map
      4.42      3.96     0.25 27686739     0.00     0.00  feat_is_wall
      3.89      4.18     0.22 13264329     0.00     0.00  square_isdedge
      3.72      4.39     0.21 13251117     0.00     0.00  square_ismark
      3.19      4.57     0.18 11910801     0.00     0.00  feat_is_quartz
      2.83      4.73     0.16 13424248     0.00     0.00  square_isseen
      2.65      4.88     0.15 13289174     0.00     0.00  square_istrap
      2.48      5.02     0.14 26173270     0.00     0.00  square_object
      1.95      5.13     0.11 12011200     0.00     0.00  feat_is_magma
      1.59      5.22     0.09 13928040     0.00     0.00  feat_is_treasure
      1.06      5.28     0.06   806907     0.00     0.00  Term_queue_char
      1.06      5.34     0.06     1000     0.06     0.14  process_world
      0.53      5.37     0.03   591342     0.00     0.00  square_isno_esp
      0.53      5.40     0.03    87336     0.00     0.00  my_stristr
      0.53      5.43     0.03    24904     0.00     0.00  Term_fresh
      0.35      5.45     0.02  1844131     0.00     0.00  square_isglow
    and here's 3.5

    Code:
      %   cumulative   self              self     total           
     time   seconds   seconds    calls  ms/call  ms/call  name    
     39.18      0.67     0.67     1017     0.66     1.46  display_map
     38.60      1.33     0.66 13277845     0.00     0.00  map_info
      7.60      1.46     0.13 13278967     0.00     0.00  get_first_object
      2.34      1.50     0.04    26333     0.00     0.00  parser_parse
      1.75      1.53     0.03   640224     0.00     0.00  Term_queue_char
      1.17      1.55     0.02    87335     0.00     0.00  my_stristr
      1.17      1.57     0.02    11169     0.00     0.00  create_mask
      0.58      1.58     0.01   129954     0.00     0.00  cave_monster_max
      0.58      1.59     0.01   108363     0.00     0.00  mem_alloc
      0.58      1.60     0.01    81990     0.00     0.00  flag_on_dbg
      0.58      1.61     0.01    68687     0.00     0.00  my_strcpy
      0.58      1.62     0.01    32143     0.00     0.00  cave_set_feat
      0.58      1.63     0.01    28705     0.00     0.00  Term_erase
      0.58      1.64     0.01     4801     0.00     0.00  eval_r_power
      0.58      1.65     0.01     3145     0.00     0.00  Term_fresh
      0.58      1.66     0.01     1022     0.01     0.01  my_strcat
      0.58      1.67     0.01     1008     0.01     0.02  object_desc
      0.58      1.68     0.01        1    10.00    49.57  play_game
      0.58      1.69     0.01        1    10.00    10.00  reset_visuals
    Again, 3.5 is faster 1.7s to 5.6. The immediate thing I noticed is how many more times "flag_has_dbg" is called in 4.0. So I guess that's a first place to look.

    If anyone wants the full log files, which have much more information than what's here, let me know and I'll get them to you.
  • Nick
    Vanilla maintainer
    • Apr 2007
    • 9637

    #2
    This is excellent. What profiler were you using, and on what OS?
    One for the Dark Lord on his dark throne
    In the Land of Mordor where the Shadows lie.

    Comment

    • fizzix
      Prophet
      • Aug 2009
      • 3025

      #3
      gprof on windows7.

      Comment

      • Nick
        Vanilla maintainer
        • Apr 2007
        • 9637

        #4
        Looks like having the overhead map open in a subwindow is giving the biggest, erm, overhead. I've done a fix that improves that when resting.
        One for the Dark Lord on his dark throne
        In the Land of Mordor where the Shadows lie.

        Comment

        • fizzix
          Prophet
          • Aug 2009
          • 3025

          #5
          Originally posted by Nick
          Looks like having the overhead map open in a subwindow is giving the biggest, erm, overhead. I've done a fix that improves that when resting.
          I don't think I had a subwindow with overhead map on though... I think I had, monster knowledge, monster list, object list, prev. messages for both 3.5 and 4.0

          I can try it again tonight with no subwindows open if that'd be useful.

          Comment

          • Nick
            Vanilla maintainer
            • Apr 2007
            • 9637

            #6
            Originally posted by fizzix
            I don't think I had a subwindow with overhead map on though... I think I had, monster knowledge, monster list, object list, prev. messages for both 3.5 and 4.0

            I can try it again tonight with no subwindows open if that'd be useful.
            It looked to me like update_minimap_window was the dominant thing in the call graph, for all the runs except 3.5 stats. In any case, I think I have sped up the resting at least, so it's a win for profiling already.
            One for the Dark Lord on his dark throne
            In the Land of Mordor where the Shadows lie.

            Comment

            • fizzix
              Prophet
              • Aug 2009
              • 3025

              #7
              Originally posted by Nick
              It looked to me like update_minimap_window was the dominant thing in the call graph, for all the runs except 3.5 stats. In any case, I think I have sped up the resting at least, so it's a win for profiling already.
              Ok, it's fairly easy to generate these profiles now that I've figured out how to get it to work properly. So if you want me to run more tests, let me know. My guess though is that you can do it fairly easily on openSuse, since at least my version has gprof. The steps are:

              recompile with -g -pg flags (might just need -pg, but I did both)
              open the game, do stuff
              run gprof on executable
              profit

              Comment

              • MadeOfBees
                Scout
                • Apr 2013
                • 44

                #8
                Are you using text or a tileset?

                Comment

                • fizzix
                  Prophet
                  • Aug 2009
                  • 3025

                  #9
                  Originally posted by MadeOfBees
                  Are you using text or a tileset?
                  ascii.....

                  Comment

                  • Nick
                    Vanilla maintainer
                    • Apr 2007
                    • 9637

                    #10
                    Originally posted by fizzix
                    Ok, it's fairly easy to generate these profiles now that I've figured out how to get it to work properly. So if you want me to run more tests, let me know. My guess though is that you can do it fairly easily on openSuse, since at least my version has gprof. The steps are:

                    recompile with -g -pg flags (might just need -pg, but I did both)
                    open the game, do stuff
                    run gprof on executable
                    profit
                    Not producing gmon.out for me. Where did you put the -g -pg flags?
                    One for the Dark Lord on his dark throne
                    In the Land of Mordor where the Shadows lie.

                    Comment

                    • AnonymousHero
                      Veteran
                      • Jun 2007
                      • 1393

                      #11
                      Originally posted by Nick
                      Not producing gmon.out for me. Where did you put the -g -pg flags?
                      Can't anwser for fizzix, but I think you'll need to supply both "-g -pg" to the compiler (aka via CFLAGS) and probably also to the linker (LDFLAGS? I seem to recall there was some wonkiness with the makefiles around the use of the linker. The makefile seems to be using a separate linked step whereas it should probably just use the "gcc" frontend to perform the linking. That would make this and the use of "sanitize" flags a lot easier.)

                      Comment

                      • fizzix
                        Prophet
                        • Aug 2009
                        • 3025

                        #12
                        Hmm, I'm not getting a gmon.out on OpenSuse either, maybe AnonymousHero's advice will work, I'm sort of flailing wildly. (I also can't compile latest master branch on OpenSuse, although earlier ones appear to work. Don't know why)

                        I'll check on windows when I get home.

                        Comment

                        • Nick
                          Vanilla maintainer
                          • Apr 2007
                          • 9637

                          #13
                          Originally posted by AnonymousHero
                          Can't anwser for fizzix, but I think you'll need to supply both "-g -pg" to the compiler (aka via CFLAGS) and probably also to the linker (LDFLAGS? I seem to recall there was some wonkiness with the makefiles around the use of the linker. The makefile seems to be using a separate linked step whereas it should probably just use the "gcc" frontend to perform the linking. That would make this and the use of "sanitize" flags a lot easier.)
                          Awesome, thank you.

                          For anyone else doing this on Linux, I just did this:
                          Code:
                          CFLAGS += [COLOR="Red"]-g -pg [/COLOR]-I. -std=c99 -Wdeclaration-after-statement -O0
                          [COLOR="Red"]LDFLAGS += -g -pg[/COLOR]
                          in src/Makefile.
                          Last edited by Nick; January 24, 2016, 01:38. Reason: Wrong makefile
                          One for the Dark Lord on his dark throne
                          In the Land of Mordor where the Shadows lie.

                          Comment

                          • AnonymousHero
                            Veteran
                            • Jun 2007
                            • 1393

                            #14
                            Originally posted by Nick
                            Awesome, thank you.

                            For anyone else doing this on Linux, I just did this:
                            Code:
                            CFLAGS += [COLOR="Red"]-g -pg [/COLOR]-I. -std=c99 -Wdeclaration-after-statement -O0
                            [COLOR="Red"]LDFLAGS += -g -pg[/COLOR]
                            in Makefile.src.
                            BONUS!

                            (I still mean the thing about the makefile being semi-broken about linking[1], though).

                            [1] If you're doing -fsanitize-ANYTHING, you're not going to be happy. The thing is that the "ld" flags are hidously complicated and non-cross-platform. You want to just say "cc -o blah *.o". Ask me for details.

                            Comment

                            Working...
                            😀
                            😂
                            🥰
                            😘
                            🤢
                            😎
                            😞
                            😡
                            👍
                            👎