Home > Enterprise >  Will Druid continue to work fast even if my SELECT COUNT(*) ... has no time boundaries?
Will Druid continue to work fast even if my SELECT COUNT(*) ... has no time boundaries?

Time:01-13

I have a statement which looks like the following to compute a total of counts:

SELECT id, SUM("count") AS total FROM my_table GROUP BY id

Say Druid ingest about 1 million rows of data per day. The rows are relatively small (like 20 columns, the longest string is around 100 characters). Each row includes a date and an identifier. The data gets aggregated by id with 5-minute windows.

Will that SELECT statement continue to be fast after a few years of data ingestion?

CodePudding user response:

Druid is surprisingly fast in groupby's because of the way data is stored and the way query engine is optimized for reading the data.

Will that SELECT statement continue to be fast after a few years of data ingestion?

I think the question is how you are rolling up the data. If you have enabled compaction and doing monthly rollups, then the above query should not pose an issue. To read more about automatic compaction here: https://druid.apache.org/docs/latest/data-management/automatic-compaction.html

If you have more doubts please feel free to open an issue on github: https://github.com/apache/druid/issues or find us on the druid slack channel https://druid.apache.org/community/

  • Related