Home > Mobile >  BigQuery: Rolling daily count visitor's summary of payment
BigQuery: Rolling daily count visitor's summary of payment

Time:05-20

I have this data:

date visitor_id total_payment
2022-01-01 A 20
2022-01-01 B 15
2022-01-01 C 20
2022-01-02 B 10
2022-01-02 D 25

I'd like to have daily count of visitor with total_payment equal or greater than 20$, with that being said, result I'm hoping is:

date count_visitor
2022-01-01 2
2022-01-02 4

2022-01-01 is 2 because only A and C have payment more than 20$, however on 2022-01-02 additional 2 more because B is 35$ (sum) and D is 25$.

Is there any possible query for this? I hope I'm clear on my description. Thank you in advance.

CodePudding user response:

You can use this query as solution. First, I calculate cumulative payments of each user. Then, I find the minimum date for each user that exceeds 20$ cumulative payment. At the last step, I count number of users for each minimum date, and also accumulate that number. In the output you don't have to have first_day_users column, but I kept it to make it easier to understand the code.

So the output looks like that: enter image description here

WITH 
data AS(
  SELECT "2022-01-01" AS date, "A" AS visitor_id, 20 AS total_payment UNION ALL
  SELECT "2022-01-01" AS date, "B" AS visitor_id, 15 AS total_payment UNION ALL
  SELECT "2022-01-01" AS date, "C" AS visitor_id, 20 AS total_payment UNION ALL
  SELECT "2022-01-02" AS date, "B" AS visitor_id, 10 AS total_payment UNION ALL
  SELECT "2022-01-02" AS date, "D" AS visitor_id, 25 AS total_payment
),
user_cumulatives as 
(
    SELECT
        visitor_id,
        date,
        SUM(total_payment) OVER (PARTITION BY visitor_id ORDER BY date) as cumulative_payment
    FROM data
),
user_first_dates as 
(
    select visitor_id, min(date) as date
    from user_cumulatives
    where cumulative_payment >= 20
    group by 1
)
select date, count(*) as first_day_users, sum(count(*)) over (order by date) as count_visitor
from user_first_dates
group by 1
order by date

CodePudding user response:

Welcome @Indri

The query below will give you a running sum of the rows per day where the total_amount of greater than of equal to 20, I believe this should give you the answer you are looking for:

WITH data AS(
  SELECT "2022-01-01" AS date, "A" AS visitor_id, 20 AS total_payment
  UNION ALL
  SELECT "2022-01-01" AS date, "B" AS visitor_id, 15 AS total_payment
  UNION ALL
  SELECT "2022-01-01" AS date, "C" AS visitor_id, 20 AS total_payment
  UNION ALL
  SELECT "2022-01-02" AS date, "A" AS visitor_id, 10 AS total_payment
  UNION ALL
  SELECT "2022-01-02" AS date, "D" AS visitor_id, 25 AS total_payment
)

SELECT
*,
COUNT(*) OVER(ORDER BY date)
FROM data
WHERE total_payment >= 20
  • Related