Dataset Here is the task : Count users that have multiple transactions and have at least one transaction that has been made within 7 days interval of the other one.
Structure of dataset: Row, userId, orderId, date
Date is formatted as YYYY-MM-DDTHH:MM:SS Example: 2016-09-16T11:32:06
I have completed the first part (counting users with multiple transactions), but I do not know how to do the second part in the same query. I will be thankful for help.
Here is the console:
query = '''
SELECT COUNT(*)
FROM
(SELECT userId FROM `dataset` GROUP BY userId HAVING COUNT(orderId) > 1)
'''
project_id = 'acdefg'
df = pd.io.gbq.read_gbq(query, project_id=project_id, dialect='standard')
display(df)
CodePudding user response:
To solve this issue you want to be able to compare each record to a previous record: when was the last order from the same user. This hints to the use of partitions and window functions, in this case LAG
.
A possible way to solve the problem is to organise records per user and order them by orderDate and then for each record have a look at the record just above:
WITH intermediate_table AS (
SELECT
userId,
orderDate,
LAG(orderDate)
OVER (PARTITION BY userId ORDER BY orderDate) -- this is where we pick the orderDate of the record right above, once the orders are organized by userId and ordered by orderDate
FROM `dataset.table`
)
SELECT userId
FROM intermediate_table
WHERE DATE_DIFF(orderDate, previous_order, DAY) <= 7
GROUP BY userId
Once orderDate
and previous_order
info are gathered in the same record, it's easy to compare them and see if there is less than 7 days between the two.
(GROUP BY is used for returning userIds only once in the resulting table)
CodePudding user response:
This may be what you need:
-- for each order calculate the days since that customer's last order
order_profiler AS (
SELECT
orderId,
orderDate,
custId,
DATE_DIFF(orderDate, LAG(orderDate) OVER (PARTITION BY custId ORDER BY orderDate), day) AS order_latency_days,
FROM
`dataset.table`
)
SELECT
custId,
FROM order_profiler
WHERE order_latency_days <= 7
GROUP BY custId