@Xaxyx:
You are not uninvited, at least not more than me... ;-P
Your query should work as well...
In fact, it should even work better than my solution, in other words: it will be faster (because there're no subqueries which need more time during execution). The results should be identical.
As I said, I'm no guru, I still learn something new every day... I will try to test your solution whe I get the chance...
Cheers, NinjaN
Looking for a SQL guru
Collapse
X
-
Pardon me for jumping in uninvited, but aren't we overcomplicating things a bit? Are we just looking for counts?
SELECT level,
SUM(CASE WHEN pval = 1 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +1',
SUM(CASE WHEN pval = 2 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +2',
SUM(CASE WHEN pval = 3 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +3',
SUM(CASE WHEN pval = 4 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed +4',
SUM(CASE WHEN pval > 4 AND of_idx = 13 THEN 1 ELSE 0 END) AS 'Speed > +4'
FROM wearable_pval_flags
WHERE level BETWEEN 1 AND 50
GROUP BY level
ORDER BY levelLeave a comment:
-
Thanks again - it's getting a little more complicated now so it might take me a while to get my head around the aliasing stuff, but I will definitely try something like this to get a table of results.Leave a comment:
-
@Timo:
SELECT *, COUNT(*)... will not work, at least as far as I know. You can't group by * which is necessary for COUNT() to work. One could SELECT every field and GROUP BY every field but then COUNT() is almost certainly always 1 (because each unique set of fields exists once, eg. there is an autoincremented counter for id'ing that row).
@Magnate:
Your query
SELECT level, pval, COUNT(*)
FROM `wearables_pval_flags`
WHERE (level BETWEEN 1 AND 100)
AND (pval BETWEEN 1 AND 50)
AND (of_idx IS 13)
GROUP BY level, pval
ORDER BY level, pval
is correct in syntax. Since I don't know about the structure of the database I can't say if it is 'asking' for the correct data...
@Azerath:
You are correct, subqueries can make this result better readable. The query should work like this:
SELECT DISTINCT level,
(SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 1) AND (of_idx = 13)) AS 'Speed +1',
(SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 2) AND (of_idx = 13)) AS 'Speed +2',
(SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 3) AND (of_idx = 13)) AS 'Speed +3',
(SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval = 4) AND (of_idx = 13)) AS 'Speed +4',
(SELECT COUNT(*) AS '+1' FROM 'wearable_pval_flags' wpf2 WHERE (wpf2.level = wpf1.level) AND (pval >= 5) AND (of_idx = 13)) AS 'Speed > +4'
FROM 'wearable_pval_flafs' wpf1
WHERE (wpf1.level BETWEEN 1 AND 50)
ORDER BY level
@Magnate:
What the above query does is the following:
It asks for all unique entries for level (DISTINCT is the Keyword for that) in your table and reports them once only.
Notice the alias 'wpf1' in the FROM part. This alias can be used instead of the full table name. You see that wpf1 is my alias for the 'outer' table while wpf2 is used inside the subqueries. That way, we can address the same table twice while being able to query a specific 'copy' of that table.
In the following nested SELECT-statements (also called subqueries), the value for level is put in as filter (wpf2.level = wpf1.level) while the other filters are just for wpf2 (no alias always means 'this table' which is the inner one). Subqueries can only deliver exactly one field, COUNT(*) in this example.
Behind each subquery you see another alias. This is an alias for that column, so it gets a name in your result. You can also give 'normal' fields an alias, so instead of eg. 'pval' you get 'speed enhancement' in your result. It's important to know that you can't use this alias in other parts of your query. An alias for columns only shows in the final result set, nowhere else. Aliases for tables can be used in your whole query, though.
Hope this is as helpful and understandable as my other post...
Cheers, NinjaNLeave a comment:
-
-
1. When you want to get rectangular table instead of flat list of records, usually databases support such functionality search for (PIVOT in SqlServer; CrossTab in postgress, etc)
Summarizing instread of getting this (flat list of records):
Action1 Ring 123
Action1 Sword 45
Action2 Ring 112
Action2 Sword 55
Action3 Ring 101
Action3 Sword 33
get this (rectangular result):
Action Ring Sword
Action1 123 45
Action2 112 55
Action3 101 33
2. Excluding can be done like in C/C++ AND (... OR ... OR ...), but exclusion from some set use IN or NOT IN
SELECT COUNT(1) FROM Abc WHERE (Level BETWEEN 1 AND 100) AND Kind NOT IN ('Ring', 'Sword')
EDIT:
3. Rectangular result can be achieved using sub queries if are supported by DB (maybe not fastest way, but will work):
[example from http://stackoverflow.com/questions/2...ivot-examples]
select
distinct a,
(select distinct t2.b from t t2 where t1.a=t2.a and t2.b='Ring'),
(select distinct t2.b from t t2 where t1.a=t2.a and t2.b='Sword')
from t t1Leave a comment:
-
It's about 500MB, so I don't think that'd be a good idea. I sent you the url and pw in a PM, but had some problems (Oook said something about a missing security token a few times). If you don't get the PM, let me know and I'll email you.Leave a comment:
-
Magnate: Im not getting any smarter from the src/stats directory...looks more like some db driver capabilities.
Can you mail me the database-file? You should get my mail address from trac.
You qurey looks really wrong to me. But i need to figure out whats wrong.Leave a comment:
-
EDIT: Doh!! I just realised that my query looks at ALL pvals, not just speed. I need to add WHERE of_idx IS speed ...
... ok, done that - but now the numbers look too low. My query is:
SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 100) AND (pval BETWEEN 1 AND 50) AND (of_idx IS 13) GROUP BY level, pval ORDER BY level, pval
and this shows me only 30 +speed items dropped at dl1 in 250,000 games. More worryingly, it doesn't show any kind of spike at dl5 (where =telep comes into depth) or dl15 (where =escaping comes into depth). So I am now wondering whether the wearables_pval_flags table is right - I'll need to cross-check against a query about those specific rings to be certain.Last edited by Magnate; August 31, 2011, 13:38.Leave a comment:
-
@NinjaN - thank you, that's awesome! Really easy to follow - I went to the database and re-phrased your query into this:
SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 50) AND (pval BETWEEN 1 AND 4) GROUP BY level, pval ORDER BY level, pval
and it worked, first time! There were 67 items with +4 speed dropped on dl1, out of a total of 734 items with +speed dropped on dl1 (that seems quite high - 0.3% chance in any given game of getting a speed item on dl1?!).Leave a comment:
-
@NinjaN - thank you, that's awesome! Really easy to follow - I went to the database and re-phrased your query into this:
SELECT level, pval, COUNT(*) FROM `wearables_pval_flags` WHERE (level BETWEEN 1 AND 50) AND (pval BETWEEN 1 AND 4) GROUP BY level, pval ORDER BY level, pval
and it worked, first time! There were 67 items with +4 speed dropped on dl1, out of a total of 734 items with +speed dropped on dl1 (that seems quite high - 0.3% chance in any given game of getting a speed item on dl1?!). Many thanks to you and pav for your willingness to help - I will come back with more queries on queries.
@SaThaRiel - the table structure is set out in the source code in src/stats/ - myshkin designed it, and so far it has given me all the info I have asked for!
@AnonymousHero - you may be right that perl/whatever would be better for running custom queries, but we chose SQL so that the raw data was transferable between people in a format that's widely used. Generating the data takes quite a long time (1.2s per run for me) and has a footprint of ~2.5GB, so we wanted to enable people to query data without having to run stats themselves.Leave a comment:
-
What better way to have a first post than this?
I wouldn't call myself SQL guru, but I'm using it on a daily basis, it's what I do at work...
What you are looking for here is one of many aggregation functions, namely COUNT().
In your query, you want to have two fields: the value of +speed and how often that value was found. So your query begins like this (keywords in upper case):
SELECT plusspeed, COUNT(*)
FROM yourtable
The function COUNT() does what it says; it counts the rows in your result. Only COUNT(*) in your query would give you the total sum of results fitting your filters. Put some detail fields in your SELECT and you can group that sum (more on that later).
Then you want to filter what is queried, so your query becomes:
SELECT plusspeed, COUNT(*)
FROM yourtable
WHERE (dungeonlevel BETWEEN 1 AND 50)
AND (plusspeed BETWEEN 1 AND 4)
This won't work, it will give you an error message. This is because of that aggregration function COUNT(). You need to group every field in SELECT which is _not_ aggregated (in this example only plusspeed):
SELECT plusspeed, COUNT(*)
FROM yourtable
WHERE (dungeonlevel BETWEEN 1 AND 50)
AND (plusspeed BETWEEN 1 AND 4)
GROUP BY plusspeed
Last, you could order your result:
SELECT plusspeed, COUNT(*)
FROM yourtable
WHERE (dungeonlevel BETWEEN 1 AND 50)
AND (plusspeed BETWEEN 1 AND 4)
GROUP BY plusspeed
ORDER BY plusspeed
If you want to go into detail, e. g. see at which dungeonlevel which speed enhancement was found, you can have two (or more) detail fields. Just remember to group these:
SELECT dungeonlevel, plusspeed, COUNT(*)
FROM yourtable
WHERE (dungeonlevel BETWEEN 1 AND 50)
AND (plusspeed BETWEEN 1 AND 4)
GROUP BY dungeonlevel, plusspeed
ORDER BY dungeonlevel, plusspeed
The last two queries should work for your table, given you query existing fields, of course.
Feel free to ask away, I find writing SQL to be quite fun. Also, if you happen to have more questions, I'd be happy to try to help you answering those. Knowing (and having access) to your database would help there... ;-P
Hope you can understand this blubber of mine... ;-)
Cheers, NinjaNLeave a comment:
-
IMO, SQL is not really all that great for this kind of "spelunking". You'd probably be better of just doing this using custom python/perl scripts or something.
... at least as long as you can either 1) fit data into memory, or 2) do whatever you need to do in a few passes. If you can't you're probably going to need a full database with good indexes anyway... that is likely to be at least as much effort as just writing a few scripts.
Of course, if you just want to learn... have at it!
EDIT: In other words: RDMBS'es aren't magic that you can sprinkle onto a "lots of data" problem and expect things to just work faster.Leave a comment:
-
Which database engine are you using?
Also it maybe interesting to see the table structure.Leave a comment:
Leave a comment: