I have a statement which looks like the following to compute a total of counts:
SELECT id, SUM("count") AS total FROM my_table GROUP BY id
Say Druid ingest about 1 million rows of data per day. The rows are relatively small (like 20 columns, the longest string is around 100 characters). Each row includes a date and an identifier. The data gets aggregated by id with 5-minute windows.
Will that SELECT
statement continue to be fast after a few years of data ingestion?
CodePudding user response:
Druid is surprisingly fast in groupby's because of the way data is stored and the way query engine is optimized for reading the data.
Will that SELECT statement continue to be fast after a few years of data ingestion?
I think the question is how you are rolling up the data. If you have enabled compaction and doing monthly rollups, then the above query should not pose an issue. To read more about automatic compaction here: https://druid.apache.org/docs/latest/data-management/automatic-compaction.html
If you have more doubts please feel free to open an issue on github: https://github.com/apache/druid/issues or find us on the druid slack channel https://druid.apache.org/community/