SQL: how to create columns that count "new" things and "existing" things over a-CodePudding

I have data in an orders table in Postgres that looks as follows:

month_year  order_id    customer_id
2016-04     0001        24662
2016-05     0002        24662
2016-05     0002        24662
2016-07     0003        24662
2016-07     0003        24662
2016-07     0004        24662
2016-07     0004        24662
2016-08     0005        24662
2016-08     0006        24662
2016-08     0007        24662
2016-08     0008        24662
2016-08     0009        24662
2016-08     0010        11372
2016-08     0011        11372
2016-09     0012        24662
2016-10     0013        24662
2016-10     0014        11372
2016-11     0015        24662
2016-11     0016        11372
2016-12     0017        11372
2017-01     0018        11372
2017-01     0019        11372

In the count_repeat column, I count the number of "repeat customers" per month. We can define "repeat" as a customer having EVER placed an order (whether it was last month or years ago).

For example, customer 24662 placed his/her first order in April 2016. Therefore, in any subsequent month, if customer 24662 places another order, then he/she will get a 1 count in that month.

The following code:

SELECT 
   o1.month_year, 
   count(distinct o1.customer_id) 
FROM 
    orders o1
LEFT JOIN 
   orders o2 on o1.customer_id = o2.customer_id
   AND o1.month_year > o2.month_year
   AND o1.order_id != o2.order_id
WHERE 
    o2.order_id IS NOT NULL
GROUP BY o1.month_year

Results in:

month_year     repeat_orders
2016-05        1
2016-07        1
2016-08        1
2016-09        1
2016-10        2
2016-11        2
2016-12        1
2017-01        1

Here's a SQL Fiddle for the above: http://sqlfiddle.com/#!17/18795/2

Now, I'd like to add a column for "new_customers", whereby we give a 1 count to the month in which the customer first places his/her first order

The desired output is:

month_year     repeat_orders   new_orders
2016-04        0               1
2016-05        1               0
2016-07        1               0
2016-08        1               1
2016-09        1               0 
2016-10        2               0
2016-11        2               0 
2016-12        1               0
2017-01        1               0

How would I do this?

Thanks!

CodePudding user response：

I think this is what you are looking for:

What is going on is we do a nested query to get the repeat_orders by customer first We then wrap that query up into rowset and do a group by month using Sum on the counts. The new_orders is simple math.

 SELECT month_year, Sum(repeat_orders) as repeat_orders, Sum(total_orders) - Sum(repeat_orders) as new_orders  
FROM (
SELECT
   o1.customer_id,
   o1.month_year, 
   count(distinct o1.customer_id) as total_orders,
   (Select count(distinct o2.customer_id)  
            FROM orders o2 
            Where o1.customer_id = o2.customer_id
            and o2.month_year < o1.month_year) as repeat_orders
FROM 
    orders o1
GROUP BY o1.customer_id, o1.month_year) as Data 

GROUP BY Data.month_year