I have data in an orders
table in Postgres that looks as follows:
month_year order_id customer_id
2016-04 0001 24662
2016-05 0002 24662
2016-05 0002 24662
2016-07 0003 24662
2016-07 0003 24662
2016-07 0004 24662
2016-07 0004 24662
2016-08 0005 24662
2016-08 0006 24662
2016-08 0007 24662
2016-08 0008 24662
2016-08 0009 24662
2016-08 0010 11372
2016-08 0011 11372
2016-09 0012 24662
2016-10 0013 24662
2016-10 0014 11372
2016-11 0015 24662
2016-11 0016 11372
2016-12 0017 11372
2017-01 0018 11372
2017-01 0019 11372
In the count_repeat
column, I count the number of "repeat customers" per month. We can define "repeat" as a customer having EVER placed an order (whether it was last month or years ago).
For example, customer 24662
placed his/her first order in April 2016. Therefore, in any subsequent month, if customer 24662
places another order, then he/she will get a 1 count in that month.
The following code:
SELECT
o1.month_year,
count(distinct o1.customer_id)
FROM
orders o1
LEFT JOIN
orders o2 on o1.customer_id = o2.customer_id
AND o1.month_year > o2.month_year
AND o1.order_id != o2.order_id
WHERE
o2.order_id IS NOT NULL
GROUP BY o1.month_year
Results in:
month_year repeat_orders
2016-05 1
2016-07 1
2016-08 1
2016-09 1
2016-10 2
2016-11 2
2016-12 1
2017-01 1
Here's a SQL Fiddle for the above: http://sqlfiddle.com/#!17/18795/2
Now, I'd like to add a column for "new_customers", whereby we give a 1 count to the month in which the customer first places his/her first order
The desired output is:
month_year repeat_orders new_orders
2016-04 0 1
2016-05 1 0
2016-07 1 0
2016-08 1 1
2016-09 1 0
2016-10 2 0
2016-11 2 0
2016-12 1 0
2017-01 1 0
How would I do this?
Thanks!
CodePudding user response:
I think this is what you are looking for:
What is going on is we do a nested query to get the repeat_orders by customer first We then wrap that query up into rowset and do a group by month using Sum on the counts. The new_orders is simple math.
SELECT month_year, Sum(repeat_orders) as repeat_orders, Sum(total_orders) - Sum(repeat_orders) as new_orders
FROM (
SELECT
o1.customer_id,
o1.month_year,
count(distinct o1.customer_id) as total_orders,
(Select count(distinct o2.customer_id)
FROM orders o2
Where o1.customer_id = o2.customer_id
and o2.month_year < o1.month_year) as repeat_orders
FROM
orders o1
GROUP BY o1.customer_id, o1.month_year) as Data
GROUP BY Data.month_year