Home > OS >  Druid -> Order data by another column than timestamp
Druid -> Order data by another column than timestamp

Time:12-08

I'm using scan query in druid. I'm looking for a way to sort data using some query. How I can do this?

Now I have:

DataSource: Data,
Intervals:  "1000"/"2000",
Limit:      10
Legacy:     true, 

I have column "values" and I want to sort data by this column (no by timestamp) and return each column from table but sorted by "values".

Something like:

SELECT __time, value, company, count
FROM shares
ORDER BY 1 ASC
WHERE value > 200

CodePudding user response:

Tried a similar query with the wikipedia test data :

SELECT namespace, channel, cityName, sum_added
FROM "wikipedia_demo" r 
WHERE sum_added > 30
ORDER BY sum_added DESC

which results in an error:

Error: Unknown exception

Cannot build plan for query: SELECT namespace, channel, cityName, sum_added FROM "wikipedia_demo" r WHERE sum_added > 30 ORDER BY sum_added DESC

org.apache.druid.java.util.common.ISE

The reason is that ORDER BY is only allowed on GROUP BY columns, aggregate expressions or if no grouping is done, then only on __time. Take a look at the docs here: https://druid.apache.org/docs/latest/querying/sql.html#order-by

If you are not aggregating, you can still use GROUP BY selecting all the SELECT expressions and then ORDER BY any of them, as in:

SELECT namespace, channel, cityName, sum_added
FROM "wikipedia_demo" r 
WHERE sum_added > 30
GROUP BY 1,2,3,4
ORDER BY sum_added DESC

Caution: since it is time-series data, it is a good practice to include a condition on __time to avoid scanning the whole table.

CodePudding user response:

Please also see the following docs page on the options you have regarding order on scan queries. https://druid.apache.org/docs/latest/querying/scan-query.html#time-ordering

  • Related