Home > Net >  Sql group query results by user id and date ranges dynamically
Sql group query results by user id and date ranges dynamically

Time:11-05

I have a query that collects information from different tables for a particular time range.

Currently, I make requests for every user and every date range separately, but I would like to run it for all time ranges at once, where the time range is every seven days between user_opened_account_at and user_closed_account_at, which is different for every user.

Is there any proper way to do this in one query?

Example: result I have

The result I want to see: Result I want to see

Query:

SELECT 
    usr.id as user_id,
    usr."onboardedAt" as user_opened_account_at,
    usr."closedAt" as user_closed_account_at,
    '2021-01-01' as start_range_date,
    '2021-01-08' as end_range_date,
    tx.tx_count as tx_count,
    last_user_action.action as last_user_action
FROM "Users" usr

LEFT JOIN (
    SELECT 
        "userId",
        COUNT("id") as "tx_count"
    FROM "Transactions"
    WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
    GROUP BY "userId"
) tx ON usr.id = tx."userId"

LEFT JOIN (
    SELECT "userId", "action"
    FROM "UserActions"
    WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
    ORDER BY "createdAt" DESC 
    LIMIT 1
) last_user_action ON usr.id = last_user_action."userId"

WHERE usr.id = 1
ORDER BY user_id, start_range_date

Schema:

CREATE TABLE "Users" (
    id bigserial PRIMARY KEY,
    "onboardedAt" timestamp with time zone,
    "closedAt" timestamp with time zone
);

CREATE TABLE "Transactions" (
    id bigserial PRIMARY KEY,
    "userId" bigint,
    "createdAt" timestamp with time zone,
    amount numeric(20,8) NOT NULL DEFAULT 0
);

CREATE TABLE "UserActions" (
    id bigserial PRIMARY KEY,
    "userId" bigint,
    "createdAt" timestamp with time zone,
    action character varying(255) NOT NULL
);


INSERT INTO "Users" ("onboardedAt", "closedAt") VALUES 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-02-01', '2021-03-01' ), 
    ( '2021-02-01', '2021-03-01' );

INSERT INTO "Transactions" ("userId", "createdAt", "amount") VALUES 
    ( 1, '2021-01-02',  100 ), 
    ( 1, '2021-01-08', -100 ), 
    ( 1, '2021-01-15', -200 ),
    ( 1, '2021-01-22',  200 ),

    ( 2, '2021-01-02', -100 ), 
    ( 2, '2021-01-02',  100 ), 
    ( 2, '2021-01-15', -200 ),
    ( 2, '2021-01-16',  200 ), 

    ( 3, '2021-01-02',  100 ), 
    ( 3, '2021-01-08', -100 ), 
    ( 3, '2021-01-15', -200 ),
    ( 3, '2021-01-22',  200 ),

    ( 4, '2021-02-02',   50 ), 
    ( 4, '2021-02-08', -100 ), 
    ( 4, '2021-02-15', -200 ),
    ( 4, '2021-02-22',  200 ),

    ( 5, '2021-02-02',  200 ), 
    ( 5, '2021-02-08', -400 ), 
    ( 5, '2021-02-15', -600 ),
    ( 5, '2021-02-22',  200 );

INSERT INTO "UserActions" ("userId", "createdAt", "action") VALUES 
    ( 1, '2021-01-01', 'PLAY' ), 
    ( 1, '2021-01-01', 'PLAY' ), 
    ( 1, '2021-01-02', 'DEPOSIT' ), 
    ( 1, '2021-01-08', 'DEPOSIT' ), 
    ( 1, '2021-01-09', 'PLAY' ), 
    ( 1, '2021-01-15', 'PLAY' ), 
    ( 1, '2021-01-22', 'PLAY' ), 

    ( 2, '2021-01-01', 'PLAY' ), 
    ( 2, '2021-01-01', 'PLAY' ), 
    ( 2, '2021-01-02', 'DEPOSIT' ), 
    ( 2, '2021-01-08', 'DEPOSIT' ), 
    ( 2, '2021-01-09', 'PLAY' ), 
    ( 2, '2021-01-15', 'PLAY' ), 
    ( 2, '2021-01-22', 'PLAY' ), 

    ( 3, '2021-01-01', 'PLAY' ), 
    ( 3, '2021-01-01', 'PLAY' ), 
    ( 3, '2021-01-02', 'DEPOSIT' ), 
    ( 3, '2021-01-08', 'DEPOSIT' ), 
    ( 3, '2021-01-09', 'PLAY' ), 
    ( 3, '2021-01-15', 'PLAY' ), 
    ( 3, '2021-01-22', 'PLAY' ), 

    ( 4, '2021-02-01', 'DEPOSIT' ), 
    ( 4, '2021-02-01', 'PLAY' ), 
    ( 4, '2021-02-02', 'DEPOSIT' ), 
    ( 4, '2021-02-08', 'DEPOSIT' ), 
    ( 4, '2021-02-09', 'PLAY' ), 
    ( 4, '2021-02-15', 'PLAY' ), 
    ( 4, '2021-02-22', 'PLAY' ), 

    ( 5, '2021-02-01', 'DEPOSIT' ), 
    ( 5, '2021-02-01', 'PLAY' ), 
    ( 5, '2021-02-02', 'PLAY' ), 
    ( 5, '2021-02-08', 'PLAY' ), 
    ( 5, '2021-02-09', 'PLAY' ), 
    ( 5, '2021-02-15', 'DEPOSIT' ), 
    ( 5, '2021-02-22', 'PLAY' );

CodePudding user response:

Sure. You do have to use a LATERAL join so that you can use the column values from your left table (Users) in the generate_series() table expression, but otherwise it's mostly what you'd expect. Some simplified SQL showing the important parts follows, please add a dbfiddle link with sample data if you'd like fully-working code.

SELECT u.user_id, week_start, count(t.transactions) tx_count
from users AS u
CROSS JOIN LATERAL generate_series(u.onboarded_at, u.account_closed_at, interval '1 week')
    AS week_start
LEFT JOIN transactions AS t
  ON t.created_at >= week_start AND
  AND t.created_at < (week_start   interval '1 week')
GROUP BY 1, 2;

Note that this is still mostly a glorified for-loop server-side, but that will almost always be significantly more performant than a for-loop in your code that roundtrips to the DB.

CodePudding user response:

With all weeks starting on Monday, this would do it (efficiently):

SELECT id AS user_id, u."onboardedAt", u."closedAt"
     , week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
FROM   "Users" u
CROSS  JOIN  generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT   JOIN (
   SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
   FROM   "Transactions" t
   GROUP  BY 1, 2
   ) t USING (id, week_start)
LEFT   JOIN (
   SELECT DISTINCT ON (1, 2)
          "userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
   FROM   "UserActions" a
   ORDER  BY 1, 2, "createdAt" DESC
   ) a USING (id, week_start)
ORDER  BY id, week_start;

db<>fiddle here

Working with standard weeks makes everything much simpler. We can aggregate in the "many" tables before joining, which is simpler and cheaper. Else, multiple joins can go wrong quickly. See:

Standard weeks make it easier to compare data, too. (Note that first and last week per user can be truncated (span fewer days). But that applies to the last week per user in any case.)

The LATERAL keyword is assumed automatically in a join to a set-returning function:

CROSS  JOIN  generate_series(...)

See:

Using DISTINCT ON to get the last_user_action per user. See:

I advise to user legal, lower-case identifiers, so double-quoting is not required. Makes your life with Postgres easier.

Use last non-null action

Added in a comment:

if action is null in a current week, I want to grab most recent from previous weeks

SELECT user_id, "onboardedAt", "closedAt", week_start, tx_count
     , last_user_action AS last_user_action_with_null
     , COALESCE(last_user_action, max(last_user_action) OVER (PARTITION BY user_id, null_grp)) AS last_user_action
FROM  (
   SELECT id AS user_id, u."onboardedAt", u."closedAt"
        , week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
        , count(a.last_user_action) OVER (PARTITION BY id ORDER BY week_start) AS null_grp
   FROM   "Users" u
   CROSS  JOIN  generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
   LEFT   JOIN (
      SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
      FROM   "Transactions" t
      GROUP  BY 1, 2
      ) t  USING (id, week_start)
   LEFT   JOIN (
      SELECT DISTINCT ON (1, 2)
             "userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
      FROM   "UserActions" a
      ORDER  BY 1, 2, "createdAt" DESC
      ) a USING (id, week_start)
   ) sub
ORDER  BY user_id, week_start;

db<>fiddle here

Explanation:

  • Related