why is 4.2.2 so slow

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Mike
    Rookie
    • Mar 2021
    • 21

    #16
    Some timings

    I've got some numbers - not necessarily very helpful ones, but it's hard to find a repeatable test that is also representative of real, unpredictable behavior.



    Code:
    [FONT=Courier New][B]Xygos (branched a few months ago from 4.2.1)[/B][/FONT]
                           [FONT=Courier New]O0        O2     O3 native   O3 lto native
    Startup/load/save/exit  0.361    0.229   0.270       0.245
    " + rest until hungry   1.343    0.670   0.645       0.445
    50 diving sims/level    32.514   13.827  11.000      6.665[/FONT]
    [FONT=Fixedsys][FONT=Courier New]
    [B]Today's V[/B]
    Startup/load/save/exit  0.277    -       -           0.189
    " + rest until hungry   0.669    -       -           0.261
    [/FONT][/FONT][FONT=Fixedsys][FONT=Courier New] 50 diving sims/level   35.131   -       -           7.847[/FONT][/FONT]
    Unfortunately I was unable to get a working build of 4.1.3 to compare it with, but the take-away is that (if you can accept the diving sim as realistic, at least) Angband benefits by an unusual amount from compiler flags in general and LTO in particular - and by default on Linux you get -O0. My first impression of 4.2.1 was that it was slow (when compared to Frog - I hadn't played 4.1 recently), but it's more than fast enough with the LTO build.

    As to why this is - there are a lot of small functions that are called a lot from other files. The flag_has_dbg(), square() and loc() functions are among the worst offenders here - I don't know exactly how much time they take (profiling adds too much overhead to function calls to say), but I do know that with -O3 -flto and profiling around 70% of the time is spent in the two top level functions update_view() and process_world(), while profiling -O3 (without LTO) shows a similar small-function heavy profile to -O0.


    These were built with gcc 10.2 for Linux and used the X11 (ascii) interface.

    Comment

    • Nick
      Vanilla maintainer
      • Apr 2007
      • 9634

      #17
      Originally posted by Sky
      ok but, why isn't this happening in 4.1.3 ?
      Dunno, I assume it's one of the 1200 changes since then. I'll get back to you when I've found it
      One for the Dark Lord on his dark throne
      In the Land of Mordor where the Shadows lie.

      Comment

      • Pete Mack
        Prophet
        • Apr 2007
        • 6883

        #18
        @Nick-
        Has the rendering code actually changed since then? That should be the single most stable component. Has the default window display changed? My first guess is that the overview map has become included. If that were the case, you'd need cached images for the map tile size as well as the main screen tile size.

        Comment

        • Cuboideb
          Adept
          • May 2020
          • 196

          #19
          Here is a short demo of the windows port:

          Enjoy the videos and music that you love, upload original content and share it all with friends, family and the world on YouTube.


          The first part is the current behavior.

          In the second part I commented out the calls to Term_Mark in main-win.c.

          The flicker goes away but the double sized tiles are cut, leaving the upper tile behind. The opposite is possible too, only the lower tile is visible, the upper part is overwritten.

          Using Term_Mark isn't a bug, but the drawing of double sized tiles should be redesigned.

          Comment

          • Pete Mack
            Prophet
            • Apr 2007
            • 6883

            #20
            Yikes, that is ugly.

            Comment

            • Cuboideb
              Adept
              • May 2020
              • 196

              #21
              I have to take a deeper look but perhaps the win port needs some kind of double buffering.

              Comment

              • Pete Mack
                Prophet
                • Apr 2007
                • 6883

                #22
                It's a shame that Angband is so reliant on Win32. Unfortunately the good Windows gfx are tightly bound to Visual Studio via C++ mangling, and would require extern C features for every call.

                Comment

                • backwardsEric
                  Knight
                  • Aug 2019
                  • 527

                  #23
                  All of the front ends that support the double-height tiles (Windows, SDL, SDL2, and Mac) use Term_mark(), and they'll all redraw the upper half in cases when it isn't necessary (the contents ui-term.c knows about in the upper half location didn't change and the double-height tile below didn't change as well). Making ui-term.c's change tracking aware of what are double-height tiles would help all the front ends and largely avoid the symptoms that double-buffering the Windows front-end is trying to hide.

                  One of the reasons this is likely worse in 4.2.1 and later compared to 4.1 is this change, https://github.com/angband/angband/c...3b3a454f5bac4b , made so Term_mark() called from the drawing hooks would have the intended effect.

                  Comment

                  • Pete Mack
                    Prophet
                    • Apr 2007
                    • 6883

                    #24
                    @backwards
                    But surely double buffering will also get rid of the individual monster movement visible in that video, even in the one without marking?

                    Comment

                    • Pete Mack
                      Prophet
                      • Apr 2007
                      • 6883

                      #25
                      @backwards
                      But surely double buffering will also get rid of the individual monster movement visible in that video, even in the one without marking?

                      Comment

                      • backwardsEric
                        Knight
                        • Aug 2019
                        • 527

                        #26
                        Originally posted by Pete Mack
                        @backwards
                        But surely double buffering will also get rid of the individual monster movement visible in that video, even in the one without marking?
                        I think that would depend on how the double buffering was implemented. A simple approach, always performing a buffer swap in response to TERM_XTRA_FRESH in Term_xtra_win(), should show those movements (while processing the monsters until the player has enough energy to act, there'd be a swap for each EVENT_REFRESH or explicit Term_refresh() on the main window). If the response to TERM_XTRA_FRESH has some rate limiting (main-sdl2.c seems to do that) or sets a flag that the back buffer had been updated and signals another thread which will only do the buffer swap when that flag is set and the monitor is ready to refresh, then those movements should be filtered out if the monster processing loop is fast compared to refresh rate.

                        Comment

                        • DavidMedley
                          Veteran
                          • Oct 2019
                          • 1004

                          #27
                          I did rejigger the HP and SP recovery functions in 4.2.1, and I'm a C novice, but I wouldn't think it could do much to slow the game down.
                          Please like my indie game company on Facebook! https://www.facebook.com/RatherFunGames

                          Comment

                          • backwardsEric
                            Knight
                            • Aug 2019
                            • 527

                            #28
                            The latest nightly builds, https://github.com/angband/angband/releases , include changes that avoid using Term_mark() for handling double-height tiles. The Windows, SDL, SDL2, and Mac front ends all make use of that. Feedback about whether that makes the performance with the Shockbolt tiles good enough or if more still needs to be done would be helpful, as would reports about any rendering artifacts.

                            Comment

                            • Sky
                              Veteran
                              • Oct 2016
                              • 2321

                              #29
                              ok i'll let you know.

                              Am i really the only person here on W10 x64 ?
                              "i can take this dracolich"

                              Comment

                              • Werbaer
                                Adept
                                • Aug 2014
                                • 182

                                #30
                                Originally posted by Sky
                                Am i really the only person here on W10 x64 ?
                                Maybe you're the only one on that system using tiles instead of ASCII.

                                Comment

                                Working...
                                😀
                                😂
                                🥰
                                😘
                                🤢
                                😎
                                😞
                                😡
                                👍
                                👎