I have customer transaction dataset like this:
ID | Date | Amount |
---|---|---|
1 | 1-1-21 | 5 |
2 | 2-1-21 | 8 |
1 | 2-1-21 | 6 |
1 | 3-1-21 | 5 |
2 | 3-1-21 | 9 |
2 | 3-1-21 | 10 |
I have to groupby and aggregate the data on customer level like this:
ID | Total Amount | Number of days active |
---|---|---|
1 | 16 | 3 |
2 | 27 | 2 |
Total Amount
= sum of all Amount
column
Number of days active
= Number of days customer made 1 or more transactions
How do I calculate my column Number of days active
? So far I have tried:
df= df.groupby('ID').agg({'Amount': lambda price: price.sum(),
'Date': lambda date: len(date).days})
My Total Amount
column is fine but I cannot find the Number of days active
CodePudding user response:
Let us do groupby
with agg
: nunique
sum
out = df.groupby('ID').agg(Numberofdaysactive = ('Date','nunique'),TotalAmount = ('Amount','sum')).reset_index()
out
Out[384]:
ID Numberofdaysactive TotalAmount
0 1 3 16
1 2 2 27
CodePudding user response:
nunique
should be what you need. That is, the aggregate df can be calculated by:
df_agg = df.groupby('ID').agg({"Amount":sum, "Date":pd.Series.nunique})
Note how you can pass function handles directly to agg
.