Given a DataFrame that represents instances of called customers:
import pandas as pd
import numpy as np
df_1 = pd.DataFrame({"customer_id" : [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5]})
The data is ordered by time such that every customer is a time-series and every customer has different timestamps. Thus I need a column that consists of the ranked timepoints:
df_2 = pd.DataFrame({"customer_id" : [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5],
"call_nr" : [0,1,2,0,1,0,1,2,3,0,0,1]})
After trying different approaches I came up with this to create call_nr:
np.concatenate([np.arange(df["customer_id"].value_counts().loc[i]) for i in df["customer_id"].unique()])
It works, but I doubt this is best practice. Is there a better solution?
CodePudding user response:
A simpler solution would be to groupby
your 'customer_id' and use cumcount
:
>>> df_1.groupby('customer_id').cumcount()
0 0
1 1
2 2
3 0
4 1
5 0
6 1
7 2
8 3
9 0
10 0
11 1
which you can assign back as a column in your dataframe