Home > Blockchain >  SQL query - order one group of columns by another column
SQL query - order one group of columns by another column

Time:11-01

I intend to create a table with millions of rows, organized in the following way:

file,word,occurrences

file23,apples,343<br>
file17,apples,3<br>
file10341,apples,0<br>
...<br>
file16000,apples,0

This table will have 16,000 unique words, and their occurrences in 16,000 unique files.

I want to find a way to query the table by targeting specific words, and then finding only the most relevant results - so the top 10 files with most occurrences.

I have tried the following code and it's variations, which has not worked:

WHERE/HAVING `word`=('apples', 'oranges', 'prunes') ORDER BY `occurrences` DESC

That being said, how can I perform this query, i.e., to focus specifically on the relevant words, list them by occurrences, and grab the top 10-15 most relevant files?

CodePudding user response:

That's what a FULLTEXT index is for. The table would have one row per file, so a small 16K rows. There would be one TEXT column with the words in that file. Then MATCH(col) AGAINST(" apples oranges" IN BOOLEAN MODE) would provide the rows with all those words, plus the relevance. ("Relevance" is not quite "occurrences".)

If you choose to stick with your current structure, "occurrences" is not well-specified. What if there are 100 apples and 20 oranges in one file; is that exactly as "relevant" as a file with 60 occurrences of each? Should a file with 'apples' but no 'oranges' be considered? Or do all the given words need to be present?

BTW, this is the syntax you are looking for:

`word` IN ('apples', 'oranges', 'prunes')

A question about your data. (This could impact performance.) How many 'files' have a typical word? Typically, how many files have all the requested words?

CREATE TABLE t (
    `file` VARCHAR(...) NOT NULL,
    `word` VARCHAR(...) NOT NULL,
    occurrences INT NOT NULL,
    PRIMARY KEY(`word`, `file`),
    INDEX(`file`)
) ENGINE=InnoDB;

SELECT `file`, SUM(occurrences) as tot_occ
    FROM t
    WHERE `word` IN ('apples', 'oranges', 'prunes')
    GROUP BY `file`
    ORDER BY tot_occ DESC
    LIMIT 10;

CodePudding user response:

Simple order by with Limit should work.

select file, word, occurrences from yourtable where word in ('apples', 'oranges', 'prunes')  ORDER BY occurrences DESC
Limit 10

** Instead of word=('apples', 'oranges', 'prunes') you need to use word in ('apples', 'oranges', 'prunes')

** Having is not required since there is no group by clause

  • Related