I'm using scan query in druid. I'm looking for a way to sort data using some query. How I can do this?
Now I have:
DataSource: Data,
Intervals: "1000"/"2000",
Limit: 10
Legacy: true,
I have column "values" and I want to sort data by this column (no by timestamp) and return each column from table but sorted by "values".
Something like:
SELECT __time, value, company, count
FROM shares
ORDER BY 1 ASC
WHERE value > 200
CodePudding user response:
Tried a similar query with the wikipedia test data :
SELECT namespace, channel, cityName, sum_added
FROM "wikipedia_demo" r
WHERE sum_added > 30
ORDER BY sum_added DESC
which results in an error:
Error: Unknown exception
Cannot build plan for query: SELECT namespace, channel, cityName, sum_added FROM "wikipedia_demo" r WHERE sum_added > 30 ORDER BY sum_added DESC
org.apache.druid.java.util.common.ISE
The reason is that ORDER BY is only allowed on GROUP BY columns, aggregate expressions or if no grouping is done, then only on __time. Take a look at the docs here: https://druid.apache.org/docs/latest/querying/sql.html#order-by
If you are not aggregating, you can still use GROUP BY selecting all the SELECT expressions and then ORDER BY any of them, as in:
SELECT namespace, channel, cityName, sum_added
FROM "wikipedia_demo" r
WHERE sum_added > 30
GROUP BY 1,2,3,4
ORDER BY sum_added DESC
Caution: since it is time-series data, it is a good practice to include a condition on __time to avoid scanning the whole table.
CodePudding user response:
Please also see the following docs page on the options you have regarding order on scan
queries.
https://druid.apache.org/docs/latest/querying/scan-query.html#time-ordering