Using mtrace - z-virt is considered harmful

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • calris
    Adept
    • Mar 2016
    • 194

    Using mtrace - z-virt is considered harmful

    So I decided to run mtrace on Angband to see if any of my sound module code leaked memory.

    Turns out Angband leaks like a sieve, but because of z-virt, it is impossible to figure out where the leaks are. mtrace tells me all the unfreed memory was allocated in z-virt.c. Even inlining the mem_*() functions didn't help (now there all allocated in z-virt.h)

    So I macro'd mem_alloc to malloc etc, etc. Angband crashed immediately

    So z-virt is harmfull for two reasons:
    1. It makes it impossible to use mtrace, and;
    2. It's obviously providing cover for some poor coding practices


    I know checking for NULL after malloc/zalloc is a PITA (there are ~400 instances in the source), but it would allow better memory debugging using standard tools.
  • calris
    Adept
    • Mar 2016
    • 194

    #2
    Speaking of bad programming practices, I came across this gem in cave.c:

    Code:
    c->feat_count = mem_zalloc((z_info->f_max + 1) * sizeof(int));
    Sure, c->feat_count is currently in int and is unlikely to ever not be, but if there is a lot more of this in the code, it might cause a few headaches

    Comment

    • Therem Harth
      Knight
      • Jan 2008
      • 926

      #3
      Hmm. A thought, maybe off target...

      Perhaps Angband could eventually offload its memory allocation and data structure stuff to a common third-party library, such as APR or GLib?

      I know that extra dependencies are not desirable, but both of the above are fairly small.

      Comment

      • redlumf
        Scout
        • Aug 2015
        • 25

        #4
        I've tried valgrind (on 4.0.1)

        Code:
        valgrind --tool=memcheck --leak-check=full ./angband -mgcu
        Found two errors.
        Code:
        =3920== 40 bytes in 1 blocks are definitely lost in loss record 27 of 95
        ==3920==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
        ==3920==    by 0x4E07F8: mem_alloc (z-virt.c:41)
        ==3920==    by 0x4E0861: mem_zalloc (z-virt.c:54)
        [COLOR="Red"]==3920==    by 0x4DA58D: dice_new (z-dice.c:197)[/COLOR]
        ==3920==    by 0x438F3D: parse_class_dice (init.c:3527)
        ==3920==    by 0x47AF54: parser_parse (parser.c:335)
        ==3920==    by 0x47B8D6: parse_file (parser.c:671)
        ==3920==    by 0x43947B: run_parse_class (init.c:3636)
        ==3920==    by 0x47B72E: run_parser (parser.c:629)
        ==3920==    by 0x43ACDD: init_arrays (init.c:4365)
        ==3920==    by 0x43AD9D: init_angband (init.c:4441)
        ==3920==    by 0x4E1635: main (main.c:546)
        ==3920== 
        ==3920== 48 bytes in 3 blocks are definitely lost in loss record 44 of 95
        ==3920==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
        ==3920==    by 0x4E07F8: mem_alloc (z-virt.c:41)
        ==3920==    by 0x4E0988: string_make (z-virt.c:98)
        [COLOR="red"]==3920==    by 0x49611F: parse_buy_flag (store.c:261)
        [/COLOR]==3920==    by 0x47AF54: parser_parse (parser.c:335)
        ==3920==    by 0x47B8D6: parse_file (parser.c:671)
        ==3920==    by 0x49628D: run_parse_stores (store.c:287)
        ==3920==    by 0x47B72E: run_parser (parser.c:629)
        ==3920==    by 0x49637F: store_init (store.c:334)
        ==3920==    by 0x43AD9D: init_angband (init.c:4441)
        ==3920==    by 0x4E1635: main (main.c:546)
        ==3920== 
        ==3920== LEAK SUMMARY:
        ==3920==    definitely lost: 88 bytes in 4 blocks
        ==3920==    indirectly lost: 0 bytes in 0 blocks
        ==3920==      possibly lost: 0 bytes in 0 blocks
        ==3920==    still reachable: 296,656 bytes in 392 blocks
        ==3920==         suppressed: 0 bytes in 0 blocks

        Comment

        • takkaria
          Veteran
          • Apr 2007
          • 1951

          #5
          Originally posted by calris
          So I decided to run mtrace on Angband to see if any of my sound module code leaked memory.

          Turns out Angband leaks like a sieve, but because of z-virt, it is impossible to figure out where the leaks are. mtrace tells me all the unfreed memory was allocated in z-virt.c. Even inlining the mem_*() functions didn't help (now there all allocated in z-virt.h)

          So I macro'd mem_alloc to malloc etc, etc. Angband crashed immediately
          That's interesting - I can't reproduce this. It would be good to know what's causing these crashes. I used the following macros: (on OS X)

          Code:
          #define mem_alloc malloc
          #define mem_free free
          #define mem_realloc realloc
          #define mem_zalloc(x) calloc(1, x)
          Last edited by takkaria; March 28, 2016, 15:18.
          takkaria whispers something about options. -more-

          Comment

          • AnonymousHero
            Veteran
            • Jun 2007
            • 1393

            #6
            Originally posted by calris
            I know checking for NULL after malloc/zalloc is a PITA (there are ~400 instances in the source), but it would allow better memory debugging using standard tools.
            It should be possible to define wrapper macros for some of this. (Macros aren't great either, but sometimes they're better than tons of boilerplate... not that I'd necessarily consider 400 instances of null checks disastrous. Angband is... what, 100k+ lines?)

            Btw, you might want to try compiling with ASAN (see "-fsantize=address" in the GCC/Clang options) instead. Perhaps that works better. (I know that it works pretty damn well on ToME 2.x. If you have debug information I think it even gives a full stack trace for any leaks.). IIRC you'll have to fiddle a bit with the makefiles to have them use "$CC" to link rather than "$LD", but other than that it should be straightforward.

            EDIT: ASAN is also way ahead of valgrind, at the very least in terms of performance. It's actually feasible to just always compile with ASAN. (Not just in debug mode.)

            Comment

            • calris
              Adept
              • Mar 2016
              • 194

              #7
              Originally posted by takkaria
              That's interesting - I can't reproduce this. It would be good to know what's causing these crashes. I used the following macros: (on OS X)

              Code:
              #define mem_alloc malloc
              #define mem_free free
              #define mem_realloc realloc
              #define mem_zalloc(x) calloc(1, x)
              I just gave it another go and it works now. Not sure what I did before

              Comment

              • calris
                Adept
                • Mar 2016
                • 194

                #8
                I added -fsanitize=address and -g3 (for good measure) to CFLAGS and -lasan into LIBS (hacked buildsys.mk - need to find a permanent place to put it)

                I've reverted back to the mem_*() functions

                Angband is VERY playable - I couldn't notice a speed penalty

                I think mtrace() is a bit too excitable - muntrace() is called before the end of main() so it picks up everything the C library has internally allocated. ASAN is much fairer - Angband doesn't leak nearly as much as I thought.

                Here's what I've found so far:

                - There are some resource leaks in x11 - I can fix most of them, but a couple elude me.

                - SDL Sound leaks 64 bytes, but it looks like this is buried in the SDL library (SDL_AllocRW)

                - Existing character, starting in town, quit immediately:
                • parse_buy_flag() - store.c:262
                • parse_class_dice() - z-dice.c:197


                - Existing character, starting in town, descend, search dungeon, learn a new spell, return to town:
                • parse_buy_flag() - store.c:262
                • parse_class_dice() - z-dice.c:197
                • spell_collect_from_book() - player-spell.c:199
                • object_new() - obj-pile.c:200
                • cmd_set_arg_string() - cmd-core.c:492


                - Starting a new character
                • parse_buy_flag() - store.c:262
                • parse_class_dice() - z-dice.c:197
                • object_new() - obj-pile.c:200
                • rd_player() - load.c:676
                • player_quests_reset() - player-quest.c:163
                • player_spells_init() - player-spell.c:135
                • rd_item() - load.c:101

                Comment

                • calris
                  Adept
                  • Mar 2016
                  • 194

                  #9
                  Originally posted by Therem Harth
                  Hmm. A thought, maybe off target...

                  Perhaps Angband could eventually offload its memory allocation and data structure stuff to a common third-party library, such as APR or GLib?

                  I know that extra dependencies are not desirable, but both of the above are fairly small.
                  Abstracting memory allocation and deallocation makes debugging harder, not easier. Most debugging tools (like ASAN for example), rely on overloading malloc and free. If these get buried in a library, they can become impossible to trace.

                  For example, here's what ASAN reports about dice_new() leakage:
                  Code:
                  Direct leak of 40 byte(s) in 1 object(s) allocated from:
                      #0 0x7f43b19bb96a in malloc (/lib64/libasan.so.2+0x9896a)
                      #1 0x62c3d8 in mem_alloc /home/gruss/Angband_devel/angband/src/z-virt.c:41
                      #2 0x62c461 in mem_zalloc /home/gruss/Angband_devel/angband/src/z-virt.c:54
                      #3 0x61d50c in dice_new /home/gruss/Angband_devel/angband/src/z-dice.c:197
                      #4 0x48c550 in parse_class_dice /home/gruss/Angband_devel/angband/src/init.c:3573
                      #5 0x52bf9e in parser_parse /home/gruss/Angband_devel/angband/src/parser.c:335
                      #6 0x52d77d in parse_file /home/gruss/Angband_devel/angband/src/parser.c:671
                      #7 0x48cd31 in run_parse_class /home/gruss/Angband_devel/angband/src/init.c:3682
                      #8 0x52d508 in run_parser /home/gruss/Angband_devel/angband/src/parser.c:629
                      #9 0x48fcc2 in init_arrays /home/gruss/Angband_devel/angband/src/init.c:4412
                      #10 0x48fe50 in init_angband /home/gruss/Angband_devel/angband/src/init.c:4488
                      #11 0x62dde3 in main /home/gruss/Angband_devel/angband/src/main.c:531
                      #12 0x7f43affb557f in __libc_start_main (/lib64/libc.so.6+0x2057f)
                  And here's what it tells me about the SDL leak:

                  Code:
                  Direct leak of 64 byte(s) in 1 object(s) allocated from:
                      #0 0x7f43b19bb96a in malloc (/lib64/libasan.so.2+0x9896a)
                      #1 0x7f43b0ad443d in SDL_AllocRW (/lib64/libSDL-1.2.so.0+0x1043d)

                  Comment

                  • Therem Harth
                    Knight
                    • Jan 2008
                    • 926

                    #10
                    Originally posted by calris
                    Abstracting memory allocation and deallocation makes debugging harder, not easier. Most debugging tools (like ASAN for example), rely on overloading malloc and free. If these get buried in a library, they can become impossible to trace.
                    Huh. Thanks, that is useful to know.

                    Although, I thought that LD_PRELOAD libraries took precedence over other shared libs, and would therefore override in those too?

                    For instance, with sudo's NOEXEC functionality, the library sudo_noexec.so overloads execv(), execl(), and related C library functions. This works even for programs that offload process execution to a shared library.

                    Is LD_PRELOAD not being used here, or is there something else I'm missing?

                    Edit: oh N/M, I see what you're saying. Though I'm surprised it doesn't show the full stack trace.

                    Comment

                    • calris
                      Adept
                      • Mar 2016
                      • 194

                      #11
                      Originally posted by Therem Harth
                      Edit: oh N/M, I see what you're saying. Though I'm surprised it doesn't show the full stack trace.
                      It probably would if you build the libraries with debug info. Release libraries are stripped bare. But yeah, I don't get why there is no call trace - maybe the memory was allocated by a worker thread, in which case all bets are off

                      Comment

                      • calris
                        Adept
                        • Mar 2016
                        • 194

                        #12
                        This one has me stumped:

                        Code:
                        Direct leak of 40 byte(s) in 1 object(s) allocated from:
                            #0 0x7f9da5f6796a in malloc (/lib64/libasan.so.2+0x9896a)
                            #1 0x62c3f9 in mem_alloc /home/gruss/Angband_devel/angband/src/z-virt.c:41
                            #2 0x62c482 in mem_zalloc /home/gruss/Angband_devel/angband/src/z-virt.c:54
                            #3 0x61d5e9 in dice_new /home/gruss/Angband_devel/angband/src/z-dice.c:197
                            #4 0x48c636 in parse_class_dice /home/gruss/Angband_devel/angband/src/init.c:3573
                            #5 0x52c087 in parser_parse /home/gruss/Angband_devel/angband/src/parser.c:335
                            #6 0x52d866 in parse_file /home/gruss/Angband_devel/angband/src/parser.c:671
                            #7 0x48ce17 in run_parse_class /home/gruss/Angband_devel/angband/src/init.c:3682
                            #8 0x52d5f1 in run_parser /home/gruss/Angband_devel/angband/src/parser.c:629
                            #9 0x48fda8 in init_arrays /home/gruss/Angband_devel/angband/src/init.c:4412
                            #10 0x48ff36 in init_angband /home/gruss/Angband_devel/angband/src/init.c:4488
                            #11 0x62dfc4 in main /home/gruss/Angband_devel/angband/src/main.c:553
                            #12 0x7f9da456157f in __libc_start_main (/lib64/libc.so.6+0x2057f)
                        The dice allocated in parse_class_dice() appear, to me, to be attached to a spell effect of a spell in a spell book.

                        But this is all cleaned up in cleanup_class(). For every spell in every book, a call to free_effect(spell->effect) is made which frees the dice:

                        Code:
                        void free_effect(struct effect *source)
                        {
                        	struct effect *e = source, *e_next;
                        	while (e) {
                        		e_next = e->next;
                        		[b]dice_free(e->dice);   <--- HERE[/b]
                        		mem_free(e);
                        		e = e_next;
                        	}
                        }
                        So why are we left with one solitary dice in memory?

                        I even went as far as logging every dice create and free, and sure enough, there is an odd number of rows - So there is a mismatch between creation and freeing

                        EDIT: Found it - Bug in class.txt:1979

                        Code:
                        spell:Resist Heat and Cold:15:13:45:4
                        effect:TIMED_INC:OPP_COLD
                        dice:10+d10
                        effect:TIMED_INC:OPP_FIRE
                        dice:10+d10
                        desc:Gives you temporary resistance to cold and fire, for 10+1d10 turns each.
                        [b]dice:10+d10  <-- HERE[/b]
                        desc:Gives you temporary resistance to cold and fire, for 10+1d10 turns each.
                        EDIT: Which raises the question - How much sanity checking do we need for these files? It's pretty easy to put in a check for this instance (it's how I found it after all), but do we need to trap every case of possible double mem_alloc() due to an error in the data files? And what do we do if we find one - mem_free() the fist and overwrite with the new one? That can get messy if you accidentally duplicate an effect:, or spell:, or book: line as it would requiring rewinding at various levels.

                        I'm thinking to catch the duplicates (easy to do) and spit out an plog message and just ignore the duplicate
                        Last edited by calris; March 29, 2016, 14:01.

                        Comment

                        • takkaria
                          Veteran
                          • Apr 2007
                          • 1951

                          #13
                          Originally posted by calris
                          EDIT: Which raises the question - How much sanity checking do we need for these files? It's pretty easy to put in a check for this instance (it's how I found it after all), but do we need to trap every case of possible double mem_alloc() due to an error in the data files? And what do we do if we find one - mem_free() the fist and overwrite with the new one? That can get messy if you accidentally duplicate an effect:, or spell:, or book: line as it would requiring rewinding at various levels.

                          I'm thinking to catch the duplicates (easy to do) and spit out an plog message and just ignore the duplicate
                          I think the correct behaviour on input errors at initialisation is to report an error and exit. Seems like a whole bunch of parsing logic could do with this, though I suppose it's not as important if there's no memory allocation happening.
                          takkaria whispers something about options. -more-

                          Comment

                          • redlumf
                            Scout
                            • Aug 2015
                            • 25

                            #14
                            Amiga OS has no resource tracking, which means if you leak memory then the memory is "lost" for the OS. Of course, C's malloc does some housekeeeping and autoclean up at exit. But if you hit the OS directly bypassing it...

                            Code:
                            3.Work:t/asm/ang> avail
                            Type  Available    In-Use   Maximum   Largest
                            chip    2006040     87016   2093056   2005176
                            fast  150653816    341128 150994944 134217696
                            total 152659856    428144 153088000 134217696
                            3.Work:t/asm/ang> angband >ser:
                            3.Work:t/asm/ang> avail         
                            Type  Available    In-Use   Maximum   Largest
                            chip    2006040     87016   2093056   2005176
                            fast  150653736    341208 150994944 134217696
                            total 152659776    428224 153088000 134217696
                            3.Work:t/asm/ang> eval 428224-428144
                            80
                            There! 80 bytes lost by starting in town and exit.

                            Comment

                            Working...
                            😀
                            😂
                            🥰
                            😘
                            🤢
                            😎
                            😞
                            😡
                            👍
                            👎