I have a query that collects information from different tables for a particular time range.
Currently, I make requests for every user and every date range separately, but I would like to run it for all time ranges at once, where the time range is every seven days between user_opened_account_at and user_closed_account_at, which is different for every user.
Is there any proper way to do this in one query?
Query:
SELECT
usr.id as user_id,
usr."onboardedAt" as user_opened_account_at,
usr."closedAt" as user_closed_account_at,
'2021-01-01' as start_range_date,
'2021-01-08' as end_range_date,
tx.tx_count as tx_count,
last_user_action.action as last_user_action
FROM "Users" usr
LEFT JOIN (
SELECT
"userId",
COUNT("id") as "tx_count"
FROM "Transactions"
WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
GROUP BY "userId"
) tx ON usr.id = tx."userId"
LEFT JOIN (
SELECT "userId", "action"
FROM "UserActions"
WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
ORDER BY "createdAt" DESC
LIMIT 1
) last_user_action ON usr.id = last_user_action."userId"
WHERE usr.id = 1
ORDER BY user_id, start_range_date
Schema:
CREATE TABLE "Users" (
id bigserial PRIMARY KEY,
"onboardedAt" timestamp with time zone,
"closedAt" timestamp with time zone
);
CREATE TABLE "Transactions" (
id bigserial PRIMARY KEY,
"userId" bigint,
"createdAt" timestamp with time zone,
amount numeric(20,8) NOT NULL DEFAULT 0
);
CREATE TABLE "UserActions" (
id bigserial PRIMARY KEY,
"userId" bigint,
"createdAt" timestamp with time zone,
action character varying(255) NOT NULL
);
INSERT INTO "Users" ("onboardedAt", "closedAt") VALUES
( '2021-01-01', '2021-02-01' ),
( '2021-01-01', '2021-02-01' ),
( '2021-01-01', '2021-02-01' ),
( '2021-02-01', '2021-03-01' ),
( '2021-02-01', '2021-03-01' );
INSERT INTO "Transactions" ("userId", "createdAt", "amount") VALUES
( 1, '2021-01-02', 100 ),
( 1, '2021-01-08', -100 ),
( 1, '2021-01-15', -200 ),
( 1, '2021-01-22', 200 ),
( 2, '2021-01-02', -100 ),
( 2, '2021-01-02', 100 ),
( 2, '2021-01-15', -200 ),
( 2, '2021-01-16', 200 ),
( 3, '2021-01-02', 100 ),
( 3, '2021-01-08', -100 ),
( 3, '2021-01-15', -200 ),
( 3, '2021-01-22', 200 ),
( 4, '2021-02-02', 50 ),
( 4, '2021-02-08', -100 ),
( 4, '2021-02-15', -200 ),
( 4, '2021-02-22', 200 ),
( 5, '2021-02-02', 200 ),
( 5, '2021-02-08', -400 ),
( 5, '2021-02-15', -600 ),
( 5, '2021-02-22', 200 );
INSERT INTO "UserActions" ("userId", "createdAt", "action") VALUES
( 1, '2021-01-01', 'PLAY' ),
( 1, '2021-01-01', 'PLAY' ),
( 1, '2021-01-02', 'DEPOSIT' ),
( 1, '2021-01-08', 'DEPOSIT' ),
( 1, '2021-01-09', 'PLAY' ),
( 1, '2021-01-15', 'PLAY' ),
( 1, '2021-01-22', 'PLAY' ),
( 2, '2021-01-01', 'PLAY' ),
( 2, '2021-01-01', 'PLAY' ),
( 2, '2021-01-02', 'DEPOSIT' ),
( 2, '2021-01-08', 'DEPOSIT' ),
( 2, '2021-01-09', 'PLAY' ),
( 2, '2021-01-15', 'PLAY' ),
( 2, '2021-01-22', 'PLAY' ),
( 3, '2021-01-01', 'PLAY' ),
( 3, '2021-01-01', 'PLAY' ),
( 3, '2021-01-02', 'DEPOSIT' ),
( 3, '2021-01-08', 'DEPOSIT' ),
( 3, '2021-01-09', 'PLAY' ),
( 3, '2021-01-15', 'PLAY' ),
( 3, '2021-01-22', 'PLAY' ),
( 4, '2021-02-01', 'DEPOSIT' ),
( 4, '2021-02-01', 'PLAY' ),
( 4, '2021-02-02', 'DEPOSIT' ),
( 4, '2021-02-08', 'DEPOSIT' ),
( 4, '2021-02-09', 'PLAY' ),
( 4, '2021-02-15', 'PLAY' ),
( 4, '2021-02-22', 'PLAY' ),
( 5, '2021-02-01', 'DEPOSIT' ),
( 5, '2021-02-01', 'PLAY' ),
( 5, '2021-02-02', 'PLAY' ),
( 5, '2021-02-08', 'PLAY' ),
( 5, '2021-02-09', 'PLAY' ),
( 5, '2021-02-15', 'DEPOSIT' ),
( 5, '2021-02-22', 'PLAY' );
CodePudding user response:
Sure. You do have to use a LATERAL join so that you can use the column values from your left table (Users) in the generate_series() table expression, but otherwise it's mostly what you'd expect. Some simplified SQL showing the important parts follows, please add a dbfiddle link with sample data if you'd like fully-working code.
SELECT u.user_id, week_start, count(t.transactions) tx_count
from users AS u
CROSS JOIN LATERAL generate_series(u.onboarded_at, u.account_closed_at, interval '1 week')
AS week_start
LEFT JOIN transactions AS t
ON t.created_at >= week_start AND
AND t.created_at < (week_start interval '1 week')
GROUP BY 1, 2;
Note that this is still mostly a glorified for-loop server-side, but that will almost always be significantly more performant than a for-loop in your code that roundtrips to the DB.
CodePudding user response:
With all weeks starting on Monday, this would do it (efficiently):
SELECT id AS user_id, u."onboardedAt", u."closedAt"
, week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
FROM "Users" u
CROSS JOIN generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT JOIN (
SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
FROM "Transactions" t
GROUP BY 1, 2
) t USING (id, week_start)
LEFT JOIN (
SELECT DISTINCT ON (1, 2)
"userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
FROM "UserActions" a
ORDER BY 1, 2, "createdAt" DESC
) a USING (id, week_start)
ORDER BY id, week_start;
db<>fiddle here
Working with standard weeks makes everything much simpler. We can aggregate in the "many" tables before joining, which is simpler and cheaper. Else, multiple joins can go wrong quickly. See:
Standard weeks make it easier to compare data, too. (Note that first and last week per user can be truncated (span fewer days). But that applies to the last week per user in any case.)
The LATERAL
keyword is assumed automatically in a join to a set-returning function:
CROSS JOIN generate_series(...)
See:
Using DISTINCT ON
to get the last_user_action
per user. See:
I advise to user legal, lower-case identifiers, so double-quoting is not required. Makes your life with Postgres easier.
Use last non-null action
Added in a comment:
if action is null in a current week, I want to grab most recent from previous weeks
SELECT user_id, "onboardedAt", "closedAt", week_start, tx_count
, last_user_action AS last_user_action_with_null
, COALESCE(last_user_action, max(last_user_action) OVER (PARTITION BY user_id, null_grp)) AS last_user_action
FROM (
SELECT id AS user_id, u."onboardedAt", u."closedAt"
, week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
, count(a.last_user_action) OVER (PARTITION BY id ORDER BY week_start) AS null_grp
FROM "Users" u
CROSS JOIN generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT JOIN (
SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
FROM "Transactions" t
GROUP BY 1, 2
) t USING (id, week_start)
LEFT JOIN (
SELECT DISTINCT ON (1, 2)
"userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
FROM "UserActions" a
ORDER BY 1, 2, "createdAt" DESC
) a USING (id, week_start)
) sub
ORDER BY user_id, week_start;
db<>fiddle here
Explanation: