Looking for a SQL guru

Collapse
X
 
  • Time
  • Show
Clear All
new posts

  • NinjaN
    replied
    @Xaxyx:

    You are not uninvited, at least not more than me... ;-P

    Your query should work as well...
    In fact, it should even work better than my solution, in other words: it will be faster (because there're no subqueries which need more time during execution). The results should be identical.

    As I said, I'm no guru, I still learn something new every day... I will try to test your solution whe I get the chance...


    Cheers, NinjaN

    Leave a comment:


  • Xaxyx
    replied
    Pardon me for jumping in uninvited, but aren't we overcomplicating things a bit? Are we just looking for counts?

    SELECT level,
    SUM(CASE WHEN pval = 1 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +1',
    SUM(CASE WHEN pval = 2 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +2',
    SUM(CASE WHEN pval = 3 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +3',
    SUM(CASE WHEN pval = 4 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +4',
    SUM(CASE WHEN pval > 4 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed > +4'
    FROM wearable_pval_flags
    WHERE level BETWEEN 1 AND 50
    GROUP BY level
    ORDER BY level

    Leave a comment:


  • Magnate
    replied
    Thanks again - it's getting a little more complicated now so it might take me a while to get my head around the aliasing stuff, but I will definitely try something like this to get a table of results.

    Leave a comment:


  • NinjaN
    replied
    @Timo:

    SELECT *, COUNT(*)... will not work, at least as far as I know. You can't group by * which is necessary for COUNT() to work. One could SELECT every field and GROUP BY every field but then COUNT() is almost certainly always 1 (because each unique set of fields exists once, eg. there is an autoincremented counter for id'ing that row).


    @Magnate:

    Your query

    SELECT level, pval, COUNT(*)
    FROM `wearables_pval_flags`
    WHERE (level BETWEEN 1 AND 100)
    AND (pval BETWEEN 1 AND 50)
    AND (of_idx IS 13)
    GROUP BY level, pval
    ORDER BY level, pval

    is correct in syntax. Since I don't know about the structure of the database I can't say if it is 'asking' for the correct data...


    @Azerath:
    You are correct, subqueries can make this result better readable. The query should work like this:

    SELECT DISTINCT level,
    (SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 1) AND (of_idx = 13)) AS 'Speed +1',
    (SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 2) AND (of_idx = 13)) AS 'Speed +2',
    (SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 3) AND (of_idx = 13)) AS 'Speed +3',
    (SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 4) AND (of_idx = 13)) AS 'Speed +4',
    (SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval >= 5) AND (of_idx = 13)) AS 'Speed > +4'
    FROM 'wearable_pval_flafs' wpf1
    WHERE (wpf1.level BETWEEN 1 AND 50)
    ORDER BY level


    @Magnate:
    What the above query does is the following:

    It asks for all unique entries for level (DISTINCT is the Keyword for that) in your table and reports them once only.

    Notice the alias 'wpf1' in the FROM part. This alias can be used instead of the full table name. You see that wpf1 is my alias for the 'outer' table while wpf2 is used inside the subqueries. That way, we can address the same table twice while being able to query a specific 'copy' of that table.

    In the following nested SELECT-statements (also called subqueries), the value for level is put in as filter (wpf2.level = wpf1.level) while the other filters are just for wpf2 (no alias always means 'this table' which is the inner one). Subqueries can only deliver exactly one field, COUNT(*) in this example.

    Behind each subquery you see another alias. This is an alias for that column, so it gets a name in your result. You can also give 'normal' fields an alias, so instead of eg. 'pval' you get 'speed enhancement' in your result. It's important to know that you can't use this alias in other parts of your query. An alias for columns only shows in the final result set, nowhere else. Aliases for tables can be used in your whole query, though.


    Hope this is as helpful and understandable as my other post...


    Cheers, NinjaN

    Leave a comment:


  • Timo Pietilä
    replied
    Originally posted by Magnate
    SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 100) AND (pval BETWEEN 1 AND 50) AND (of_idx IS 13) GROUP BY level, pval ORDER BY level, pval
    I think your select is too restricted.

    Try SELECT *, COUNT(*) FROM ....

    Leave a comment:


  • Azerath
    replied
    1. When you want to get rectangular table instead of flat list of records, usually databases support such functionality search for (PIVOT in SqlServer; CrossTab in postgress, etc)

    Summarizing instread of getting this (flat list of records):
    Action1 Ring 123
    Action1 Sword 45
    Action2 Ring 112
    Action2 Sword 55
    Action3 Ring 101
    Action3 Sword 33

    get this (rectangular result):
    Action Ring Sword
    Action1 123 45
    Action2 112 55
    Action3 101 33

    2. Excluding can be done like in C/C++ AND (... OR ... OR ...), but exclusion from some set use IN or NOT IN

    SELECT COUNT(1) FROM Abc WHERE (Level BETWEEN 1 AND 100) AND Kind NOT IN ('Ring', 'Sword')

    EDIT:

    3. Rectangular result can be achieved using sub queries if are supported by DB (maybe not fastest way, but will work):
    [example from http://stackoverflow.com/questions/2...ivot-examples]

    select
    distinct a,
    (select distinct t2.b from t t2 where t1.a=t2.a and t2.b='Ring'),
    (select distinct t2.b from t t2 where t1.a=t2.a and t2.b='Sword')
    from t t1

    Leave a comment:


  • Magnate
    replied
    Originally posted by SaThaRiel
    Magnate: Im not getting any smarter from the src/stats directory...looks more like some db driver capabilities.
    Can you mail me the database-file?
    It's about 500MB, so I don't think that'd be a good idea. I sent you the url and pw in a PM, but had some problems (Oook said something about a missing security token a few times). If you don't get the PM, let me know and I'll email you.

    Leave a comment:


  • SaThaRiel
    replied
    Magnate: Im not getting any smarter from the src/stats directory...looks more like some db driver capabilities.
    Can you mail me the database-file? You should get my mail address from trac.

    You qurey looks really wrong to me. But i need to figure out whats wrong.

    Leave a comment:


  • Magnate
    replied
    Originally posted by Timo Pietilä
    Those would be Ring of Escaping & Rings of Teleportation. +4 & +2 speed, not much OoD.
    Ah yes - thanks Timo, you've neatly illustrated my next question for my mentors - how do I *exclude* certain things from a query? So for example, I want to run that same query but excluding where k_idx == ring_of_escaping etc.

    EDIT: Doh!! I just realised that my query looks at ALL pvals, not just speed. I need to add WHERE of_idx IS speed ...

    ... ok, done that - but now the numbers look too low. My query is:

    SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 100) AND (pval BETWEEN 1 AND 50) AND (of_idx IS 13) GROUP BY level, pval ORDER BY level, pval

    and this shows me only 30 +speed items dropped at dl1 in 250,000 games. More worryingly, it doesn't show any kind of spike at dl5 (where =telep comes into depth) or dl15 (where =escaping comes into depth). So I am now wondering whether the wearables_pval_flags table is right - I'll need to cross-check against a query about those specific rings to be certain.
    Last edited by Magnate; August 31, 2011, 13:38.

    Leave a comment:


  • Timo Pietilä
    replied
    Originally posted by Magnate
    @NinjaN - thank you, that's awesome! Really easy to follow - I went to the database and re-phrased your query into this:

    SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 50) AND (pval BETWEEN 1 AND 4) GROUP BY level, pval ORDER BY level, pval

    and it worked, first time! There were 67 items with +4 speed dropped on dl1, out of a total of 734 items with +speed dropped on dl1 (that seems quite high - 0.3% chance in any given game of getting a speed item on dl1?!).
    Those would be Ring of Escaping & Rings of Teleportation. +4 & +2 speed, not much OoD.

    Leave a comment:


  • Magnate
    replied
    @NinjaN - thank you, that's awesome! Really easy to follow - I went to the database and re-phrased your query into this:

    SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 50) AND (pval BETWEEN 1 AND 4) GROUP BY level, pval ORDER BY level, pval

    and it worked, first time! There were 67 items with +4 speed dropped on dl1, out of a total of 734 items with +speed dropped on dl1 (that seems quite high - 0.3% chance in any given game of getting a speed item on dl1?!). Many thanks to you and pav for your willingness to help - I will come back with more queries on queries.

    @SaThaRiel - the table structure is set out in the source code in src/stats/ - myshkin designed it, and so far it has given me all the info I have asked for!

    @AnonymousHero - you may be right that perl/whatever would be better for running custom queries, but we chose SQL so that the raw data was transferable between people in a format that's widely used. Generating the data takes quite a long time (1.2s per run for me) and has a footprint of ~2.5GB, so we wanted to enable people to query data without having to run stats themselves.

    Leave a comment:


  • NinjaN
    replied
    What better way to have a first post than this?


    I wouldn't call myself SQL guru, but I'm using it on a daily basis, it's what I do at work...

    What you are looking for here is one of many aggregation functions, namely COUNT().

    In your query, you want to have two fields: the value of +speed and how often that value was found. So your query begins like this (keywords in upper case):

    SELECT plusspeed, COUNT(*)
    FROM yourtable

    The function COUNT() does what it says; it counts the rows in your result. Only COUNT(*) in your query would give you the total sum of results fitting your filters. Put some detail fields in your SELECT and you can group that sum (more on that later).
    Then you want to filter what is queried, so your query becomes:

    SELECT plusspeed, COUNT(*)
    FROM yourtable
    WHERE (dungeonlevel BETWEEN 1 AND 50)
    AND (plusspeed BETWEEN 1 AND 4)

    This won't work, it will give you an error message. This is because of that aggregration function COUNT(). You need to group every field in SELECT which is _not_ aggregated (in this example only plusspeed):

    SELECT plusspeed, COUNT(*)
    FROM yourtable
    WHERE (dungeonlevel BETWEEN 1 AND 50)
    AND (plusspeed BETWEEN 1 AND 4)
    GROUP BY plusspeed

    Last, you could order your result:

    SELECT plusspeed, COUNT(*)
    FROM yourtable
    WHERE (dungeonlevel BETWEEN 1 AND 50)
    AND (plusspeed BETWEEN 1 AND 4)
    GROUP BY plusspeed
    ORDER BY plusspeed


    If you want to go into detail, e. g. see at which dungeonlevel which speed enhancement was found, you can have two (or more) detail fields. Just remember to group these:


    SELECT dungeonlevel, plusspeed, COUNT(*)
    FROM yourtable
    WHERE (dungeonlevel BETWEEN 1 AND 50)
    AND (plusspeed BETWEEN 1 AND 4)
    GROUP BY dungeonlevel, plusspeed
    ORDER BY dungeonlevel, plusspeed


    The last two queries should work for your table, given you query existing fields, of course.
    Feel free to ask away, I find writing SQL to be quite fun. Also, if you happen to have more questions, I'd be happy to try to help you answering those. Knowing (and having access) to your database would help there... ;-P

    Hope you can understand this blubber of mine... ;-)


    Cheers, NinjaN

    Leave a comment:


  • AnonymousHero
    replied
    IMO, SQL is not really all that great for this kind of "spelunking". You'd probably be better of just doing this using custom python/perl scripts or something.

    ... at least as long as you can either 1) fit data into memory, or 2) do whatever you need to do in a few passes. If you can't you're probably going to need a full database with good indexes anyway... that is likely to be at least as much effort as just writing a few scripts.

    Of course, if you just want to learn... have at it!

    EDIT: In other words: RDMBS'es aren't magic that you can sprinkle onto a "lots of data" problem and expect things to just work faster.

    Leave a comment:


  • pav
    replied
    Sure, feel free to fire your questions my way.

    Leave a comment:


  • SaThaRiel
    replied
    Which database engine are you using?
    Also it maybe interesting to see the table structure.

    Leave a comment:

Working...
😀
😂
🥰
😘
🤢
😎
😞
😡
👍
👎