I have a table like this
month | customer_id |
---|---|
2015-01-01 | customer_A |
2015-01-01 | customer_B |
2015-02-01 | customer_A |
2015-02-01 | customer_C |
2015-03-01 | customer_A |
2015-03-01 | customer_C |
2015-03-01 | customer_D |
2015-04-01 | customer_C |
2015-04-01 | customer_D |
I want to find out how active my customers are every month.
I tried to write a loop that can do like the example below:
Filter to count the unique customer_id in Jan 2015 (in the table above: 2)
Filter all the customer_id in Feb 2015 and check how many are in the customer list of Jan 2015. (in the table above: 1)
Continue to the last month
Loop back to start from Feb 2015
The result should be a table like this (the numbers are random)
month | month_1 | customers |
---|---|---|
2015-01-01 | 2015-02-01 | 2 (A&B) |
2015-01-01 | 2015-02-01 | 1 (only A left) |
2015-01-01 | 2015-03-01 | 1 (only A left) |
2015-01-01 | 2015-04-01 | 0 (none in the original left) |
2015-02-01 | 2015-03-01 | 2 (A&C) |
2015-02-01 | 2015-04-01 | 1 (only C left) |
Thank you so much!
CodePudding user response:
Here is some code to do what you are asking, assuming the input is in the form shown.
table = [
{'month':'2015-01-01','customer_id':'customer_A'},
{'month':'2015-01-01','customer_id':'customer_B'},
{'month':'2015-02-01','customer_id':'customer_A'},
{'month':'2015-02-01','customer_id':'customer_C'},
{'month':'2015-03-01','customer_id':'customer_A'},
{'month':'2015-03-01','customer_id':'customer_C'},
{'month':'2015-03-01','customer_id':'customer_D'},
{'month':'2015-04-01','customer_id':'customer_C'},
{'month':'2015-04-01','customer_id':'customer_D'}
]
result = []
months = sorted(list({row['month'] for row in table}))
for start in range(len(months)):
custsAtStart = {row['customer_id'] for row in table if row['month'] == months[start]}
for end in range(start, len(months)):
custsAtEnd = {row['customer_id'] for row in table if row['month'] == months[end] and row['customer_id'] in custsAtStart}
result.append((months[start], months[end], list(custsAtEnd)))
print(f"{'month' : <12} {'month_1' : <12} {'customers'}")
[print(f"{row[0] : <12} {row[1] : <12} {row[2]}") for row in result]
For a given month, it identifies the active customers and shows how many of these customers persist over the coming months. It then advances the initial observation to the next month and repeats the analysis.
Here's the output:
month month_1 customers
2015-01-01 2015-01-01 ['customer_A', 'customer_B']
2015-01-01 2015-02-01 ['customer_A']
2015-01-01 2015-03-01 ['customer_A']
2015-01-01 2015-04-01 []
2015-02-01 2015-02-01 ['customer_A', 'customer_C']
2015-02-01 2015-03-01 ['customer_A', 'customer_C']
2015-02-01 2015-04-01 ['customer_C']
2015-03-01 2015-03-01 ['customer_A', 'customer_D', 'customer_C']
2015-03-01 2015-04-01 ['customer_D', 'customer_C']
2015-04-01 2015-04-01 ['customer_D', 'customer_C']