Preparation for 4.1 release

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • kandrc
    Swordsman
    • Dec 2007
    • 299

    I just had the following lovely assertion:

    Code:
    angband: cave.c:435: object_lists_check_integrity: Assertion `pile_contains(c->squares[obj->iy][obj->ix].obj, obj)' failed.
    
    Program received signal SIGABRT, Aborted.
    0x0000007fb7c77528 in __GI_raise (sig=sig@entry=6)
        at ../sysdeps/unix/sysv/linux/raise.c:54
    54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
    (gdb) bt
    #0  0x0000007fb7c77528 in __GI_raise (sig=sig@entry=6)
        at ../sysdeps/unix/sysv/linux/raise.c:54
    #1  0x0000007fb7c789e0 in __GI_abort () at abort.c:89
    #2  0x0000007fb7c70c04 in __assert_fail_base (
        fmt=0x7fb7d5d0c0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
        assertion=assertion@entry=0x523f10 "pile_contains(c->squares[obj->iy][obj->ix].obj, obj)", file=file@entry=0x523eb8 "cave.c", line=line@entry=435, 
        function=function@entry=0x523fe0 <__PRETTY_FUNCTION__.9844> "object_lists_check_integrity") at assert.c:92
    #3  0x0000007fb7c70cac in __GI___assert_fail (
        assertion=0x523f10 "pile_contains(c->squares[obj->iy][obj->ix].obj, obj)", 
        file=0x523eb8 "cave.c", line=435, 
        function=0x523fe0 <__PRETTY_FUNCTION__.9844> "object_lists_check_integrity") at assert.c:101
    #4  0x00000000004048ec in object_lists_check_integrity (c=0xa66838, 
        c_k=0x990b98) at cave.c:435
    #5  0x000000000040a140 in square_know_pile (c=0xa66838, y=4, x=33)
        at cave-square.c:916
    #6  0x0000000000405670 in square_note_spot (c=0xa66838, y=4, x=33)
        at cave-map.c:219
    #7  0x000000000040bf34 in update_one (c=0xa66838, y=4, x=33, blind=0)
        at cave-view.c:514
    #8  0x000000000040c870 in update_view (c=0xa66838, p=0x77f4d8)
        at cave-view.c:659
    #9  0x00000000004a88ec in update_stuff (p=0x77f4d8) at player-calcs.c:2352
    #10 0x00000000004a8bec in handle_stuff (p=0x77f4d8) at player-calcs.c:2470
    #11 0x00000000004d779c in save_game () at ui-game.c:486
    #12 0x00000000004d4d94 in new_level_display_update (
        type=EVENT_NEW_LEVEL_DISPLAY, data=0x0, user=0x0) at ui-display.c:2207
    #13 0x000000000042259c in game_event_dispatch (type=EVENT_NEW_LEVEL_DISPLAY, 
        data=0x0) at game-event.c:43
    #14 0x00000000004228f8 in event_signal (type=EVENT_NEW_LEVEL_DISPLAY)
        at game-event.c:142
    #15 0x00000000004255c8 in on_new_level () at game-world.c:872
    #16 0x0000000000425b0c in run_game_loop () at game-world.c:1002
    #17 0x00000000004d7638 in play_game (new_game=false) at ui-game.c:435
    #18 0x000000000051c994 in main (argc=1, argv=0x7ffffff408) at main.c:524
    (gdb)
    This looks like one of those cases where simply examining the stack is not very useful. Was I bitten by some debugging code that was inserted to track down the recent pile issues?

    Note that it occurred during an autosave. Whole games get borked when you abort in the middle of writing the save file to disc. I didn't even get a tombstone. :'( I know; first world problems. Regardless, the game should never abort while writing to disc. This was an assert, so if I hadn't compiled with debugging symbols, it may not have failed. But I'm not sure; I didn't look at the code to see if the next line would have crashed. That said, a crash while writing to disc is a worst case scenario for a game.

    Of course, the best solution is to write perfect code, but failing that, you can replace your SIGSEGV handler while saving and use longjmp() to unwind back to before you started saving. If it was an autosave, the user can go on blissfully unaware, and hopefully it never matters. If it's a quit-save, maybe you can stick @ in town and try to save again? That could really suck, but it's better than a corrupt save file.

    Comment

    • Pete Mack
      Prophet
      • Apr 2007
      • 6883

      Simply destroying/ignoring the dirty pile would be enough to make a clean save. But I agree abort() in a save is the wrong way to go. It'd be better to do an integrity check prior to the save.

      Comment

      • Ingwe Ingweron
        Veteran
        • Jan 2009
        • 2129

        Originally posted by Nick
        [*]Monsters with fire immunity can walk, and pathfind, through lava
        Unfortunately, the fix for this bug may have gone too far. Angalacon just "pushed past" a Master Vampire and then again a Vampire Lord. Both times the Vampires moved onto lava that had come from Angalacon's breath attacks while fighting @. The Vampires should not be swapping places with Angalacon when that place is lava. It seems to me that they should have been pushed to other open squares, rather than the lava square previously occupied by Angalacon.
        “We're more of the love, blood, and rhetoric school. Well, we can do you blood and love without the rhetoric, and we can do you blood and rhetoric without the love, and we can do you all three concurrent or consecutive. But we can't give you love and rhetoric without the blood. Blood is compulsory. They're all blood, you see.”
        ― Tom Stoppard, Rosencrantz and Guildenstern are Dead

        Comment

        • wobbly
          Prophet
          • May 2012
          • 2631

          Originally posted by Ingwe Ingweron
          Unfortunately, the fix for this bug may have gone too far. Angalacon just "pushed past" a Master Vampire and then again a Vampire Lord. Both times the Vampires moved onto lava that had come from Angalacon's breath attacks while fighting @. The Vampires should not be swapping places with Angalacon when that place is lava. It seems to me that they should have been pushed to other open squares, rather than the lava square previously occupied by Angalacon.
          Is this any worse than trampling or friendly fire from breathe weapons? I didn't see much lava in my game, is the vampire any worse off in the lava then it is between you & Angalacon?

          Comment

          • Huqhox
            Adept
            • Apr 2016
            • 145

            Originally posted by Ingwe Ingweron
            Unfortunately, the fix for this bug may have gone too far. Angalacon just "pushed past" a Master Vampire and then again a Vampire Lord. Both times the Vampires moved onto lava that had come from Angalacon's breath attacks while fighting @. The Vampires should not be swapping places with Angalacon when that place is lava. It seems to me that they should have been pushed to other open squares, rather than the lava square previously occupied by Angalacon.
            This sounds reasonable. The vampires are not politely stepping out of Ancagalon's way; he is shoving past them as they are beneath his notice; he doesn't care if they end up in lava or not.
            "This has not been a recording"

            Comment

            • Nick
              Vanilla maintainer
              • Apr 2007
              • 9637

              Originally posted by Nomad
              Turns out death by treacherous weapon curse gets you quite a tombstone:

              I think I've just created the V equivalent of Nowhere Town
              One for the Dark Lord on his dark throne
              In the Land of Mordor where the Shadows lie.

              Comment

              • Mondkalb
                Knight
                • Apr 2007
                • 982

                Originally posted by Nick
                I think I've just created the V equivalent of Nowhere Town
                Should be more poetic, just change it to "sigh no more"
                My Angband winners so far

                My FAangband efforts so far

                Comment

                • Nick
                  Vanilla maintainer
                  • Apr 2007
                  • 9637

                  Originally posted by kandrc
                  Note that it occurred during an autosave. Whole games get borked when you abort in the middle of writing the save file to disc. I didn't even get a tombstone. :'( I know; first world problems. Regardless, the game should never abort while writing to disc. This was an assert, so if I hadn't compiled with debugging symbols, it may not have failed. But I'm not sure; I didn't look at the code to see if the next line would have crashed. That said, a crash while writing to disc is a worst case scenario for a game.
                  Looking closely, I believe the game was not writing to disc at the time of this stack dump. It was in the beginning of save_game(), before it had actually got to attempting to write a savefile. I think the problem arose when the game got the abort signal, tried to "panic save", and failed - presumably for the reasons related to the original assert failure.

                  Here is the relevant code from the abort signal:
                  Code:
                  static void handle_signal_abort(int sig)
                  {
                  	/* Disable handler */
                  	(void)(*signal_aux)(sig, SIG_IGN);
                  
                  	/* Nothing to save, just quit */
                  	if (!character_generated || character_saved) quit(NULL);
                  
                  	/* Clear the bottom line */
                  	Term_erase(0, 23, 255);
                  
                  	/* Give a warning */
                  	Term_putstr(0, 23, -1, COLOUR_RED,
                  	            "A gruesome software bug LEAPS out at you!");
                  
                  	/* Message */
                  	Term_putstr(45, 23, -1, COLOUR_RED, "Panic save...");
                  
                  	/* Flush output */
                  	Term_fresh();
                  
                  	/* Panic save */
                  	my_strcpy(player->died_from, "(panic save)", sizeof(player->died_from));
                  
                  	/* Forbid suspend */
                  	signals_ignore_tstp();
                  
                  	/* Attempt to save */
                  	if (savefile_save(savefile))
                  		Term_putstr(45, 23, -1, COLOUR_RED, "Panic save succeeded!");
                  	else
                  		Term_putstr(45, 23, -1, COLOUR_RED, "Panic save failed!");
                  
                  	/* Flush output */
                  	Term_fresh();
                  
                  	/* Quit */
                  	quit("software bug");
                  }
                  So I have two questions:
                  1. Panic saves only seem to cause problems - would we be better abandoning the attempt, as then the worst that happens is the player loses the current level?
                  2. Which nightly was it? This will let me know if the underlying problem was likely to be caused by one of the new (since removed) level generation schemes.
                  One for the Dark Lord on his dark throne
                  In the Land of Mordor where the Shadows lie.

                  Comment

                  • AnonymousHero
                    Veteran
                    • Jun 2007
                    • 1393

                    Originally posted by Nick
                    Looking closely, I believe the game was not writing to disc at the time of this stack dump. It was in the beginning of save_game(), before it had actually got to attempting to write a savefile. I think the problem arose when the game got the abort signal, tried to "panic save", and failed - presumably for the reasons related to the original assert failure.
                    ---snip---
                    So I have two questions:
                    1. Panic saves only seem to cause problems - would we be better abandoning the attempt, as then the worst that happens is the player loses the current level?
                    2. Which nightly was it? This will let me know if the underlying problem was likely to be caused by one of the new (since removed) level generation schemes.
                    Panic saves are an absolutely crazy idea -- trying to do anything non-trivial during a signal handler is a recipe for disaster especially when the signal in question is trying to tell you that memory is probably corrupted or "our internal state is inconsistent" (assertions). Signal handling is notoriously difficult to get right and is race-prone, etc. etc. (Sometimes one has no choice because POSIX, but...).

                    Anyway, just save every N turns and every time changing the level.

                    I don't know if the anti-cheat code is still there, but if so I'd also suggest removing the SIGINT(?) handler responsible for that. It's killed a few of my characters because a remote X11 connection was flaky.

                    [1] If it doesn't already do that. IIRC the idiom goes: create new temporary file in same directory as old file, write contents, fsync, rename over old, fsync. (I'm sure there's someone who has a proper description online.)

                    Comment

                    • Nick
                      Vanilla maintainer
                      • Apr 2007
                      • 9637

                      New builds are now up on the nightlies page, changes as follows:
                      • PowerWyrm's fix to the player resistance panels
                      • Torches of Brightness can be generated again
                      • I have checked all the possibilities for automatic opening of doors and disarming of traps - walking, running, etc - and believe they all now work correctly
                      • Ghosts in walls no longer get blacked out
                      • Tombstones for death by treacherous weapon are now more accurate (if less poetic)


                      As before, I now believe I have got all the bugs. Please let me know if I'm wrong about this.
                      One for the Dark Lord on his dark throne
                      In the Land of Mordor where the Shadows lie.

                      Comment

                      • Gwarl
                        Administrator
                        • Jan 2017
                        • 1025

                        I just want to say that right now the webserver is relying on panic saves - if a user disconnects without quitting the angband process gets left without a pty and quickly eats the cpu doing heaven knows what, so I have to send it a sigkill. The fact it does panic saves when it gets them is the only thing keeping progress from being lost - I've managed to lose poschengband progress on the server by not quitting properly.

                        If we lose the panic save on sigkill, can we at least get it responding to another signal which saves before quitting? Things are set up so that the the process controlling the pty attempts to write ^x before sending the sigkill but it doesn't always work.

                        Comment

                        • kandrc
                          Swordsman
                          • Dec 2007
                          • 299

                          Originally posted by Nick
                          1. Panic saves only seem to cause problems - would we be better abandoning the attempt, as then the worst that happens is the player loses the current level?
                          2. Which nightly was it? This will let me know if the underlying problem was likely to be caused by one of the new (since removed) level generation schemes.
                          1. There's nothing inherently wrong with panic saving. As noted in another comment, however, you can get into situations where you're touching corrupt state while simultaneously trying to recover from having touched corrupt state, then you wind up with a broken save file.

                            You could panic save to different location. Attempt to load panic-save file if it exists. If that load fails, load the default (auto-save) file. If it succeeds, move it to the default location, else unlink it.
                          2. It was g8886dc4.

                          Comment

                          • Ingwe Ingweron
                            Veteran
                            • Jan 2009
                            • 2129

                            Originally posted by Huqhox
                            This sounds reasonable. The vampires are not politely stepping out of Ancagalon's way; he is shoving past them as they are beneath his notice; he doesn't care if they end up in lava or not.
                            The point is, it is a change that was introduced by the fix for fire IMMUNE monsters being able to route-find past lava. Previously, a non-immune monster could not be pushed onto lava, only an immune monster could cross it (albeit with route-finding problems). Now a non-immune monster is found standing on lava (and also doesn't appear to take any damage for doing so).

                            To my mind, this is a bug.
                            “We're more of the love, blood, and rhetoric school. Well, we can do you blood and love without the rhetoric, and we can do you blood and rhetoric without the love, and we can do you all three concurrent or consecutive. But we can't give you love and rhetoric without the blood. Blood is compulsory. They're all blood, you see.”
                            ― Tom Stoppard, Rosencrantz and Guildenstern are Dead

                            Comment

                            • Ingwe Ingweron
                              Veteran
                              • Jan 2009
                              • 2129

                              Originally posted by Nick
                              As before, I now believe I have got all the bugs. Please let me know if I'm wrong about this.
                              Do you consider it a bug for non-fire immune monsters to be pushed onto lava when non-lava spaces are available for them? Do you consider it a bug for non-fire immune monsters to exist on lava squares without apparently taking any damage? If so, these are two bugs.
                              “We're more of the love, blood, and rhetoric school. Well, we can do you blood and love without the rhetoric, and we can do you blood and rhetoric without the love, and we can do you all three concurrent or consecutive. But we can't give you love and rhetoric without the blood. Blood is compulsory. They're all blood, you see.”
                              ― Tom Stoppard, Rosencrantz and Guildenstern are Dead

                              Comment

                              • Derakon
                                Prophet
                                • Dec 2009
                                • 9022

                                Originally posted by Gwarl
                                I just want to say that right now the webserver is relying on panic saves - if a user disconnects without quitting the angband process gets left without a pty and quickly eats the cpu doing heaven knows what, so I have to send it a sigkill. The fact it does panic saves when it gets them is the only thing keeping progress from being lost - I've managed to lose poschengband progress on the server by not quitting properly.
                                Can you have players interact with the server via screen or some similar utility? Then the angband process always has a terminal to talk to, and the player connects to that terminal rather than to Angband directly. You can also take over screen processes when players leave and do anything you want to them, without relying on panic anything.

                                Comment

                                Working...
                                😀
                                😂
                                🥰
                                😘
                                🤢
                                😎
                                😞
                                😡
                                👍
                                👎