Existing Dataframe :
Unique_Id sms_sent_date
A 11-01-2022
A 11-01-2022
A 12-01-2022
A 13-01-2022
B 02-02-2022
B 03-02-2022
B 06-02-2022
B 06-02-2022
Expected Dataframe :
Unique_Id sms_sent_date unique_sms_counter
A 11-01-2022 1
A 11-01-2022 1
A 12-01-2022 2
A 13-01-2022 3
B 02-02-2022 1
B 03-02-2022 2
B 06-02-2022 3
B 06-02-2022 3
I trying to add the smsq_counter which shall count the unique sms sent to a unique Id.
tried this df.groupby(['Unique_Id','sms_sent_date']).cumcount()
but stuck with applying it on entire dataframe
CodePudding user response:
You can use:
df['unique_sms_counter'] = (
# get non duplicates
(~df.duplicated(['Unique_Id', 'sms_sent_date']))
# increment per group
.groupby(df['Unique_Id']).cumsum()
)
output:
Unique_Id sms_sent_date unique_sms_counter
0 A 11-01-2022 1
1 A 11-01-2022 1
2 A 12-01-2022 2
3 A 13-01-2022 3
4 B 02-02-2022 1
5 B 03-02-2022 2
6 B 06-02-2022 3
7 B 06-02-2022 3
CodePudding user response:
You can use factorize
on groupby:
df['unique_sms_counter'] = df.groupby(['Unique_Id'])['sms_sent_date'].transform(lambda x: x.factorize()[0] 1)
Or, if your date is actual datetime type, you can use rank
:
df['sms_sent_date'] = pd.to_datetime(df['sms_sent_date'], dayfirst=True)
df.groupby('Unique_Id')['sms_sent_date'].rank('dense').astype(int)
Output:
Unique_Id sms_sent_date unique_sms_counter
0 A 11-01-2022 1
1 A 11-01-2022 1
2 A 12-01-2022 2
3 A 13-01-2022 3
4 B 02-02-2022 1
5 B 03-02-2022 2
6 B 06-02-2022 3
7 B 06-02-2022 3