I have a table called device_data which looks something like below
Column | Type | Collation | Nullable | Default
------------------ ----------------------------- ----------- ---------- ---------
id | integer | | |
date | timestamp without time zone | | |
upload | real | | |
download | real | | |
Now when I do a daily truncation on data, I do something like below
First query
SELECT
date_trunc('day', date) as daily,
avg(upload) as avg_upload,
avg(download) as avg_download
FROM device_info
WHERE
date BETWEEN '2022-10-07 10:28:46' AND '2022-11-06 10:28:46'
GROUP BY daily
ORDER BY daily;
I get the below result
daily | avg_upload | avg_download
--------------------- -------------------- --------------------
2022-10-07 00:00:00 | 41.691493286006484 | 41.571846902122246
2022-10-08 00:00:00 | 23.348292880412536 | 25.450085919814388
2022-10-09 00:00:00 | 14.496747577637498 | 15.03089042501744
2022-10-10 00:00:00 | 108.54434560738528 | 296.091238792341
2022-10-11 00:00:00 | 163.0374394524876 | 140.3484673696004
2022-10-12 00:00:00 | 240.54915519812943 | 74.1109145983748
2022-10-13 00:00:00 | 94.02086876518116 | 162.71920269686137
2022-10-14 00:00:00 | 58.254662611037375 | 62.832775704871615
2022-10-15 00:00:00 | 33.759871942986635 | 27.21136256905281
...
Now if I run the below query which I assume does the same thing, I get slightly different result
Second query
SELECT
date_trunc('month', date) date_part('day', date)::int / 1 * interval '1 day' AS daily,
avg(upload) as avg_upload,
avg(download) as avg_download
FROM device_info
WHERE
date BETWEEN '2022-10-07 10:28:46' and '2022-11-06 10:28:46'
GROUP BY daily
ORDER BY daily ASC;
I get the below result
daily | avg_upload | avg_download
--------------------- -------------------- --------------------
2022-10-08 00:00:00 | 41.691493286006484 | 41.571846902122246
2022-10-09 00:00:00 | 23.348292880412536 | 25.450085919814388
2022-10-10 00:00:00 | 14.496747577637498 | 15.03089042501744
2022-10-11 00:00:00 | 108.54434560738528 | 296.091238792341
2022-10-12 00:00:00 | 163.0374394524876 | 140.3484673696004
2022-10-13 00:00:00 | 240.54915519812943 | 74.1109145983748
2022-10-14 00:00:00 | 94.02086876518116 | 162.71920269686137
2022-10-15 00:00:00 | 58.254662611037375 | 62.832775704871615
2022-10-16 00:00:00 | 33.759871942986635 | 27.21136256905281
So you can see although both the output appears to be similar, the second query shifts the data by one day ie
First query (first row)
2022-10-07 00:00:00 | 41.691493286006484 | 41.571846902122246
Second query (first row)
2022-10-08 00:00:00 | 41.691493286006484 | 41.571846902122246
So where exactly is my second query going wrong (my assumption being that both queries do the same thing)?
Note: The reason I am asking this is because I would want a weekly
truncation of data and a 6 days
truncation of data where the second query can come handy.
CodePudding user response:
So where exactly is my second query going wrong?
You are assuming that date_trunc()
returns a timestamp with the day of month set to zero, which it of course does not, it returns the first day of the month. So you have to reduce your offset by one day:
...
date_trunc('month', date) (date_part('day', date)::int - 1) / 1 * interval '1 day' AS daily,
...
Remember: There are only two hard problems in computer sience:
- cache invalidation
- naming things
- off-by-one errors