Home > Back-end >  incremental counter to a dataframe based on unique values from a column
incremental counter to a dataframe based on unique values from a column

Time:09-23

Existing Dataframe :

Unique_Id          sms_sent_date         
A                    11-01-2022                   
A                    11-01-2022                   
A                    12-01-2022                   
A                    13-01-2022                   

B                    02-02-2022                   
B                    03-02-2022                   
B                    06-02-2022                   
B                    06-02-2022                   

Expected Dataframe :

Unique_Id          sms_sent_date         unique_sms_counter
A                    11-01-2022                   1
A                    11-01-2022                   1
A                    12-01-2022                   2
A                    13-01-2022                   3

B                    02-02-2022                   1
B                    03-02-2022                   2
B                    06-02-2022                   3
B                    06-02-2022                   3

I trying to add the smsq_counter which shall count the unique sms sent to a unique Id. tried this df.groupby(['Unique_Id','sms_sent_date']).cumcount() but stuck with applying it on entire dataframe

CodePudding user response:

You can use:

df['unique_sms_counter'] = (
  # get non duplicates
 (~df.duplicated(['Unique_Id', 'sms_sent_date']))
  # increment per group
 .groupby(df['Unique_Id']).cumsum()
)

output:

  Unique_Id sms_sent_date  unique_sms_counter
0         A    11-01-2022                   1
1         A    11-01-2022                   1
2         A    12-01-2022                   2
3         A    13-01-2022                   3
4         B    02-02-2022                   1
5         B    03-02-2022                   2
6         B    06-02-2022                   3
7         B    06-02-2022                   3

CodePudding user response:

You can use factorize on groupby:

df['unique_sms_counter'] = df.groupby(['Unique_Id'])['sms_sent_date'].transform(lambda x: x.factorize()[0] 1)

Or, if your date is actual datetime type, you can use rank:

df['sms_sent_date'] = pd.to_datetime(df['sms_sent_date'], dayfirst=True)

df.groupby('Unique_Id')['sms_sent_date'].rank('dense').astype(int)

Output:

  Unique_Id sms_sent_date  unique_sms_counter
0         A    11-01-2022                   1
1         A    11-01-2022                   1
2         A    12-01-2022                   2
3         A    13-01-2022                   3
4         B    02-02-2022                   1
5         B    03-02-2022                   2
6         B    06-02-2022                   3
7         B    06-02-2022                   3
  • Related