I have table A and table B. Each row in table A represents every time a user sends a message. Each row in table B represents every time a user buys a gift.
Goal: for each time a user sends a message, calculate how many gifts they've purchased within 7 days before the timestamp they sent the message. Some users never send messages and some never purchased gifts. If the user in table A didn't have gift purchased within 7 days, the count should be 0.
Table A:
user_id | time |
---|---|
12345 | 2021-09-04 09:43:55 |
12345 | 2021-09-03 00:39:30 |
12345 | 2021-09-02 03:26:07 |
12345 | 2021-09-05 15:48:34 |
23456 | 2021-09-09 09:06:22 |
23456 | 2021-09-08 08:06:21 |
00001 | 2021-09-03 15:38:15 |
00002 | 2021-09-03 15:38:15 |
Table B:
user_id | time |
---|---|
12345 | 2021-09-01 09:43:55 |
12345 | 2021-08-03 00:42:30 |
12345 | 2021-09-03 02:16:07 |
00003 | 2021-09-05 15:48:34 |
23456 | 2021-09-03 09:06:22 |
23456 | 2021-09-10 08:06:21 |
Expected output:
user_id | time | count |
---|---|---|
12345 | 2021-09-04 09:43:55 | 2 |
12345 | 2021-09-03 00:39:30 | 1 |
12345 | 2021-09-02 03:26:07 | 1 |
12345 | 2021-09-05 15:48:34 | 2 |
23456 | 2021-09-09 09:06:22 | 1 |
23456 | 2021-09-08 08:06:21 | 1 |
00001 | 2021-09-03 15:38:15 | 0 |
00002 | 2021-09-03 15:38:15 | 0 |
Query I tried:
SELECT A.user_id, A.time, coalesce(count(*), 0) as count
FROM A
LEFT JOIN B ON A.user_id = B.user_id AND B.time >= A.time - INTERVAL '7 days' AND B.time < A.time
GROUP BY 1,2
The count returned doesn't match the expected result however, not sure if I'm doing the join and conditions correctly.
CodePudding user response:
You need to count the values from the possibly NULL columns i.e. from table B in order to get the correct counts of non-existent matches. i.e. being more specific in COUNT(*)
to COUNT(b.column_from_b_table)
. See modification with working demo fiddle below:
SELECT
A.user_id,
A.time,
coalesce(count(B.user_id), 0) as count
FROM A
LEFT JOIN B ON A.user_id = B.user_id AND
B.time >= A.time - INTERVAL '7 days' AND
B.time < A.time
GROUP BY 1,2;
user_id | time | count |
---|---|---|
1 | 2021-09-03T15:38:15.000Z | 0 |
12345 | 2021-09-05T15:48:34.000Z | 2 |
23456 | 2021-09-08T08:06:21.000Z | 1 |
12345 | 2021-09-04T09:43:55.000Z | 2 |
12345 | 2021-09-03T00:39:30.000Z | 1 |
23456 | 2021-09-09T09:06:22.000Z | 1 |
2 | 2021-09-03T15:38:15.000Z | 0 |
12345 | 2021-09-02T03:26:07.000Z | 1 |
Let me know if this works for you.