Home > Back-end >  Postgres truncate timestamp gives slightly unexpected output
Postgres truncate timestamp gives slightly unexpected output

Time:11-25

I have a table called device_data which looks something like below

      Column      |            Type             | Collation | Nullable | Default 
------------------ ----------------------------- ----------- ---------- ---------
 id               | integer                     |           |          | 
 date             | timestamp without time zone |           |          | 
 upload           | real                        |           |          | 
 download         | real                        |           |          | 

Now when I do a daily truncation on data, I do something like below

First query

SELECT
  date_trunc('day', date) as daily,
  avg(upload) as avg_upload,
  avg(download) as avg_download
FROM device_info
WHERE
  date BETWEEN '2022-10-07 10:28:46' AND '2022-11-06 10:28:46'
GROUP BY daily
ORDER BY daily;

I get the below result

       daily        |     avg_upload     |    avg_download    
--------------------- -------------------- --------------------
 2022-10-07 00:00:00 | 41.691493286006484 | 41.571846902122246
 2022-10-08 00:00:00 | 23.348292880412536 | 25.450085919814388
 2022-10-09 00:00:00 | 14.496747577637498 |  15.03089042501744
 2022-10-10 00:00:00 | 108.54434560738528 |   296.091238792341
 2022-10-11 00:00:00 |  163.0374394524876 |  140.3484673696004
 2022-10-12 00:00:00 | 240.54915519812943 |   74.1109145983748
 2022-10-13 00:00:00 |  94.02086876518116 | 162.71920269686137
 2022-10-14 00:00:00 | 58.254662611037375 | 62.832775704871615
 2022-10-15 00:00:00 | 33.759871942986635 |  27.21136256905281
 ...

Now if I run the below query which I assume does the same thing, I get slightly different result

Second query

SELECT
  date_trunc('month', date)   date_part('day', date)::int / 1 * interval '1 day' AS daily,
  avg(upload) as avg_upload,
  avg(download) as avg_download
FROM device_info
WHERE
  date BETWEEN '2022-10-07 10:28:46' and '2022-11-06 10:28:46'
GROUP BY daily
ORDER BY daily ASC;

I get the below result

       daily         |     avg_upload     |    avg_download    
--------------------- -------------------- --------------------
 2022-10-08 00:00:00 | 41.691493286006484 | 41.571846902122246
 2022-10-09 00:00:00 | 23.348292880412536 | 25.450085919814388
 2022-10-10 00:00:00 | 14.496747577637498 |  15.03089042501744
 2022-10-11 00:00:00 | 108.54434560738528 |   296.091238792341
 2022-10-12 00:00:00 |  163.0374394524876 |  140.3484673696004
 2022-10-13 00:00:00 | 240.54915519812943 |   74.1109145983748
 2022-10-14 00:00:00 |  94.02086876518116 | 162.71920269686137
 2022-10-15 00:00:00 | 58.254662611037375 | 62.832775704871615
 2022-10-16 00:00:00 | 33.759871942986635 |  27.21136256905281

So you can see although both the output appears to be similar, the second query shifts the data by one day ie

First query (first row)

2022-10-07 00:00:00 | 41.691493286006484 | 41.571846902122246

Second query (first row)

2022-10-08 00:00:00 | 41.691493286006484 | 41.571846902122246

So where exactly is my second query going wrong (my assumption being that both queries do the same thing)?

Note: The reason I am asking this is because I would want a weekly truncation of data and a 6 days truncation of data where the second query can come handy.

CodePudding user response:

So where exactly is my second query going wrong?

You are assuming that date_trunc() returns a timestamp with the day of month set to zero, which it of course does not, it returns the first day of the month. So you have to reduce your offset by one day:

...
  date_trunc('month', date)   (date_part('day', date)::int - 1) / 1 * interval '1 day' AS daily,
...

Remember: There are only two hard problems in computer sience:

  • cache invalidation
  • naming things
  • off-by-one errors
  • Related