Home > Enterprise >  Assign a value to duplicates in Python
Assign a value to duplicates in Python

Time:10-05

I'm interested in assigning values to duplicate rows in a Pandas dataframe as below. The dataframe is below:-

data_1 = {'ID': ['001', '003', '001','002','002','002'], 'Name': ["XX1", "XX3", "XX1", "XX2", "XX2", "XX2"]} 
df = pd.DataFrame(data_1)  

The output should be something related to the output of df2.

output = {'ID': ['001', '003', '001','002','002','002'], 'Name': ["XX1", "XX3", "XX1", "XX2", "XX2", "XX2"],"Number": [1, 1, 2, 1, 2,3]} 
df2 = pd.DataFrame(output)  

How can I autoincrement the "Number" on duplicated "ID"?

CodePudding user response:

As per @Barmar's response, this seems to have answered it. More details are found here

df["rank"] = df.groupby("ID").rank("first", ascending=False)

CodePudding user response:

You can use groupby cumcount (adding 1 as the counts are from 0):

df['Number'] = df.groupby('ID').cumcount().add(1)

Output:

    ID Name  Number
0  001  XX1       1
1  003  XX3       1
2  001  XX1       2
3  002  XX2       1
4  002  XX2       2
5  002  XX2       3
  • Related