Home > Mobile >  Best way to handle time consuming queries in InfluxDB
Best way to handle time consuming queries in InfluxDB

Time:10-12

We have an API that queries an Influx database and a report functionality was implemented so the user can query data using a start and end date. The problem is that when a longer period is chosen(usually more than 8 weeks), we get a timeout from influx, query takes around 13 seconds to run. When the query returns a dataset successfully, we store that in cache.

The most time-consuming part of the query is probably comparison and averages we do, something like this:

SELECT mean("value") AS "mean", min("value") AS "min", max("value") AS "max" 
FROM $MEASUREMENT 
WHERE time >= $startDate AND time < $endDate 
AND ("field" = 'myFieldValue' ) 
GROUP BY "tagname"

What would be the best approach to fix this? I can of course limit the amount of weeks the user can choose, but I guess that's not the ideal fix.

How would you approach this? Increase timeout? Batch query? Any database optimization to be able to run this faster?

CodePudding user response:

In such cases where you allow user to select in days, I would suggest to have another table that stores the result (min, max and avg) of each day as a document. This table can be populated using some job after end of the day.

You can also think changing the document per day to per week or per month, based on how you plot the values. You can also add more fields like in your case, tagname and other fields.

Reason why this is superior to using a cache: When you use a cache, you can store the result of the query, so you have to compute for every different combination in realtime. However, in this case, the cumulative results are already available with much smaller dataset to compute.

CodePudding user response:

Based on your query, I assume you are using InfluxDB v1.X. You could try Continuous Queries which are InfluxQL queries that run automatically and periodically on realtime data and store query results in a specified measurement.

In your case, for each report, you could generate a CQ and let your users to query it. e.g.:

Step 1: create a CQ

CREATE CONTINUOUS QUERY "cq_basic_rp" ON "db"
BEGIN
    SELECT mean("value") AS "mean", min("value") AS "min", max("value") AS "max" 
    INTO "mean_min_max"
    FROM $MEASUREMENT 
    WHERE "field" = 'myFieldValue'  // note that the time filter is not here
    GROUP BY time(1h), "tagname"  // here you can define the job interval
END

Step 2: Query against that CQ

SELECT * FROM "mean_min_max"
WHERE time >= $startDate AND time < $endDate // here you can pass the user's time filter

Since you already ask InfluxDB to run these aggregates continuously based on the specified interval, you should be able to trade space for time.

  • Related