Looking for a SQL guru

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Magnate
    Angband Devteam member
    • May 2007
    • 4916

    Looking for a SQL guru

    Is there anybody with some didactic talent who would be willing to help me get to grips with SQL? I have the stats from twenty-five million dungeons for 3.3.0 on my server (250,000 runs of levels 1-100; password available on request for those interested in browsing), but I lack the skill to make the queries I want. I could of course RTFM but I thought it might be more enjoyable to learn from a fellow Angbander, if anyone was interested in sharing their expertise by email on an irregular basis. (I mention didactic talent because I'd quite like to learn the concepts, rather than just get someone to write queries for me.)

    For example, I don't even know if it's possible for queries to produce two-dimensional (tabular) output, instead of one-dimensional lists. So I know that I can ask it "list the numbers of base items which are found with +speed between +1 and +4 on dungeon levels 1 to 50", but can I ask it instead for a table, showing me the numbers of items with each of +1, +2, +3 and +4? Or do I just have to run four separate queries for that?

    Ho hum. The joys of stats await ...
    "Been away so long I hardly knew the place, gee it's good to be back home" - The Beatles
  • SaThaRiel
    Adept
    • Nov 2009
    • 142

    #2
    Which database engine are you using?
    Also it maybe interesting to see the table structure.
    Proud candidate for the Angband Darwin Award!

    Comment

    • pav
      Administrator
      • Apr 2007
      • 484

      #3
      Sure, feel free to fire your questions my way.
      See the elves and everything! http://angband.oook.cz

      Comment

      • AnonymousHero
        Veteran
        • Jun 2007
        • 1322

        #4
        IMO, SQL is not really all that great for this kind of "spelunking". You'd probably be better of just doing this using custom python/perl scripts or something.

        ... at least as long as you can either 1) fit data into memory, or 2) do whatever you need to do in a few passes. If you can't you're probably going to need a full database with good indexes anyway... that is likely to be at least as much effort as just writing a few scripts.

        Of course, if you just want to learn... have at it!

        EDIT: In other words: RDMBS'es aren't magic that you can sprinkle onto a "lots of data" problem and expect things to just work faster.

        Comment

        • NinjaN
          Rookie
          • Aug 2011
          • 7

          #5
          What better way to have a first post than this?


          I wouldn't call myself SQL guru, but I'm using it on a daily basis, it's what I do at work...

          What you are looking for here is one of many aggregation functions, namely COUNT().

          In your query, you want to have two fields: the value of +speed and how often that value was found. So your query begins like this (keywords in upper case):

          SELECT plusspeed, COUNT(*)
          FROM yourtable

          The function COUNT() does what it says; it counts the rows in your result. Only COUNT(*) in your query would give you the total sum of results fitting your filters. Put some detail fields in your SELECT and you can group that sum (more on that later).
          Then you want to filter what is queried, so your query becomes:

          SELECT plusspeed, COUNT(*)
          FROM yourtable
          WHERE (dungeonlevel BETWEEN 1 AND 50)
          AND (plusspeed BETWEEN 1 AND 4)

          This won't work, it will give you an error message. This is because of that aggregration function COUNT(). You need to group every field in SELECT which is _not_ aggregated (in this example only plusspeed):

          SELECT plusspeed, COUNT(*)
          FROM yourtable
          WHERE (dungeonlevel BETWEEN 1 AND 50)
          AND (plusspeed BETWEEN 1 AND 4)
          GROUP BY plusspeed

          Last, you could order your result:

          SELECT plusspeed, COUNT(*)
          FROM yourtable
          WHERE (dungeonlevel BETWEEN 1 AND 50)
          AND (plusspeed BETWEEN 1 AND 4)
          GROUP BY plusspeed
          ORDER BY plusspeed


          If you want to go into detail, e. g. see at which dungeonlevel which speed enhancement was found, you can have two (or more) detail fields. Just remember to group these:


          SELECT dungeonlevel, plusspeed, COUNT(*)
          FROM yourtable
          WHERE (dungeonlevel BETWEEN 1 AND 50)
          AND (plusspeed BETWEEN 1 AND 4)
          GROUP BY dungeonlevel, plusspeed
          ORDER BY dungeonlevel, plusspeed


          The last two queries should work for your table, given you query existing fields, of course.
          Feel free to ask away, I find writing SQL to be quite fun. Also, if you happen to have more questions, I'd be happy to try to help you answering those. Knowing (and having access) to your database would help there... ;-P

          Hope you can understand this blubber of mine... ;-)


          Cheers, NinjaN

          Comment

          • Magnate
            Angband Devteam member
            • May 2007
            • 4916

            #6
            @NinjaN - thank you, that's awesome! Really easy to follow - I went to the database and re-phrased your query into this:

            SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 50) AND (pval BETWEEN 1 AND 4) GROUP BY level, pval ORDER BY level, pval

            and it worked, first time! There were 67 items with +4 speed dropped on dl1, out of a total of 734 items with +speed dropped on dl1 (that seems quite high - 0.3% chance in any given game of getting a speed item on dl1?!). Many thanks to you and pav for your willingness to help - I will come back with more queries on queries.

            @SaThaRiel - the table structure is set out in the source code in src/stats/ - myshkin designed it, and so far it has given me all the info I have asked for!

            @AnonymousHero - you may be right that perl/whatever would be better for running custom queries, but we chose SQL so that the raw data was transferable between people in a format that's widely used. Generating the data takes quite a long time (1.2s per run for me) and has a footprint of ~2.5GB, so we wanted to enable people to query data without having to run stats themselves.
            "Been away so long I hardly knew the place, gee it's good to be back home" - The Beatles

            Comment

            • Timo Pietilä
              Prophet
              • Apr 2007
              • 3964

              #7
              Originally posted by Magnate
              @NinjaN - thank you, that's awesome! Really easy to follow - I went to the database and re-phrased your query into this:

              SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 50) AND (pval BETWEEN 1 AND 4) GROUP BY level, pval ORDER BY level, pval

              and it worked, first time! There were 67 items with +4 speed dropped on dl1, out of a total of 734 items with +speed dropped on dl1 (that seems quite high - 0.3% chance in any given game of getting a speed item on dl1?!).
              Those would be Ring of Escaping & Rings of Teleportation. +4 & +2 speed, not much OoD.

              Comment

              • Magnate
                Angband Devteam member
                • May 2007
                • 4916

                #8
                Originally posted by Timo Pietilä
                Those would be Ring of Escaping & Rings of Teleportation. +4 & +2 speed, not much OoD.
                Ah yes - thanks Timo, you've neatly illustrated my next question for my mentors - how do I *exclude* certain things from a query? So for example, I want to run that same query but excluding where k_idx == ring_of_escaping etc.

                EDIT: Doh!! I just realised that my query looks at ALL pvals, not just speed. I need to add WHERE of_idx IS speed ...

                ... ok, done that - but now the numbers look too low. My query is:

                SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 100) AND (pval BETWEEN 1 AND 50) AND (of_idx IS 13) GROUP BY level, pval ORDER BY level, pval

                and this shows me only 30 +speed items dropped at dl1 in 250,000 games. More worryingly, it doesn't show any kind of spike at dl5 (where =telep comes into depth) or dl15 (where =escaping comes into depth). So I am now wondering whether the wearables_pval_flags table is right - I'll need to cross-check against a query about those specific rings to be certain.
                Last edited by Magnate; August 31, 2011, 13:38.
                "Been away so long I hardly knew the place, gee it's good to be back home" - The Beatles

                Comment

                • SaThaRiel
                  Adept
                  • Nov 2009
                  • 142

                  #9
                  Magnate: Im not getting any smarter from the src/stats directory...looks more like some db driver capabilities.
                  Can you mail me the database-file? You should get my mail address from trac.

                  You qurey looks really wrong to me. But i need to figure out whats wrong.
                  Proud candidate for the Angband Darwin Award!

                  Comment

                  • Magnate
                    Angband Devteam member
                    • May 2007
                    • 4916

                    #10
                    Originally posted by SaThaRiel
                    Magnate: Im not getting any smarter from the src/stats directory...looks more like some db driver capabilities.
                    Can you mail me the database-file?
                    It's about 500MB, so I don't think that'd be a good idea. I sent you the url and pw in a PM, but had some problems (Oook said something about a missing security token a few times). If you don't get the PM, let me know and I'll email you.
                    "Been away so long I hardly knew the place, gee it's good to be back home" - The Beatles

                    Comment

                    • Azerath
                      Rookie
                      • Jun 2011
                      • 21

                      #11
                      1. When you want to get rectangular table instead of flat list of records, usually databases support such functionality search for (PIVOT in SqlServer; CrossTab in postgress, etc)

                      Summarizing instread of getting this (flat list of records):
                      Action1 Ring 123
                      Action1 Sword 45
                      Action2 Ring 112
                      Action2 Sword 55
                      Action3 Ring 101
                      Action3 Sword 33

                      get this (rectangular result):
                      Action Ring Sword
                      Action1 123 45
                      Action2 112 55
                      Action3 101 33

                      2. Excluding can be done like in C/C++ AND (... OR ... OR ...), but exclusion from some set use IN or NOT IN

                      SELECT COUNT(1) FROM Abc WHERE (Level BETWEEN 1 AND 100) AND Kind NOT IN ('Ring', 'Sword')

                      EDIT:

                      3. Rectangular result can be achieved using sub queries if are supported by DB (maybe not fastest way, but will work):
                      [example from http://stackoverflow.com/questions/2...ivot-examples]

                      select
                      distinct a,
                      (select distinct t2.b from t t2 where t1.a=t2.a and t2.b='Ring'),
                      (select distinct t2.b from t t2 where t1.a=t2.a and t2.b='Sword')
                      from t t1

                      Comment

                      • Timo Pietilä
                        Prophet
                        • Apr 2007
                        • 3964

                        #12
                        Originally posted by Magnate
                        SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 100) AND (pval BETWEEN 1 AND 50) AND (of_idx IS 13) GROUP BY level, pval ORDER BY level, pval
                        I think your select is too restricted.

                        Try SELECT *, COUNT(*) FROM ....

                        Comment

                        • NinjaN
                          Rookie
                          • Aug 2011
                          • 7

                          #13
                          @Timo:

                          SELECT *, COUNT(*)... will not work, at least as far as I know. You can't group by * which is necessary for COUNT() to work. One could SELECT every field and GROUP BY every field but then COUNT() is almost certainly always 1 (because each unique set of fields exists once, eg. there is an autoincremented counter for id'ing that row).


                          @Magnate:

                          Your query

                          SELECT level, pval, COUNT(*)
                          FROM `wearables_pval_flags`
                          WHERE (level BETWEEN 1 AND 100)
                          AND (pval BETWEEN 1 AND 50)
                          AND (of_idx IS 13)
                          GROUP BY level, pval
                          ORDER BY level, pval

                          is correct in syntax. Since I don't know about the structure of the database I can't say if it is 'asking' for the correct data...


                          @Azerath:
                          You are correct, subqueries can make this result better readable. The query should work like this:

                          SELECT DISTINCT level,
                          (SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 1) AND (of_idx = 13)) AS 'Speed +1',
                          (SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 2) AND (of_idx = 13)) AS 'Speed +2',
                          (SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 3) AND (of_idx = 13)) AS 'Speed +3',
                          (SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 4) AND (of_idx = 13)) AS 'Speed +4',
                          (SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval >= 5) AND (of_idx = 13)) AS 'Speed > +4'
                          FROM 'wearable_pval_flafs' wpf1
                          WHERE (wpf1.level BETWEEN 1 AND 50)
                          ORDER BY level


                          @Magnate:
                          What the above query does is the following:

                          It asks for all unique entries for level (DISTINCT is the Keyword for that) in your table and reports them once only.

                          Notice the alias 'wpf1' in the FROM part. This alias can be used instead of the full table name. You see that wpf1 is my alias for the 'outer' table while wpf2 is used inside the subqueries. That way, we can address the same table twice while being able to query a specific 'copy' of that table.

                          In the following nested SELECT-statements (also called subqueries), the value for level is put in as filter (wpf2.level = wpf1.level) while the other filters are just for wpf2 (no alias always means 'this table' which is the inner one). Subqueries can only deliver exactly one field, COUNT(*) in this example.

                          Behind each subquery you see another alias. This is an alias for that column, so it gets a name in your result. You can also give 'normal' fields an alias, so instead of eg. 'pval' you get 'speed enhancement' in your result. It's important to know that you can't use this alias in other parts of your query. An alias for columns only shows in the final result set, nowhere else. Aliases for tables can be used in your whole query, though.


                          Hope this is as helpful and understandable as my other post...


                          Cheers, NinjaN

                          Comment

                          • Magnate
                            Angband Devteam member
                            • May 2007
                            • 4916

                            #14
                            Thanks again - it's getting a little more complicated now so it might take me a while to get my head around the aliasing stuff, but I will definitely try something like this to get a table of results.
                            "Been away so long I hardly knew the place, gee it's good to be back home" - The Beatles

                            Comment

                            • Xaxyx
                              Scout
                              • Feb 2010
                              • 37

                              #15
                              Pardon me for jumping in uninvited, but aren't we overcomplicating things a bit? Are we just looking for counts?

                              SELECT level,
                              SUM(CASE WHEN pval = 1 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +1',
                              SUM(CASE WHEN pval = 2 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +2',
                              SUM(CASE WHEN pval = 3 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +3',
                              SUM(CASE WHEN pval = 4 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +4',
                              SUM(CASE WHEN pval > 4 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed > +4'
                              FROM wearable_pval_flags
                              WHERE level BETWEEN 1 AND 50
                              GROUP BY level
                              ORDER BY level

                              Comment

                              Working...
                              😀
                              😂
                              🥰
                              😘
                              🤢
                              😎
                              😞
                              😡
                              👍
                              👎