I'm interested in assigning values to duplicate rows in a Pandas dataframe as below. The dataframe is below:-
data_1 = {'ID': ['001', '003', '001','002','002','002'], 'Name': ["XX1", "XX3", "XX1", "XX2", "XX2", "XX2"]}
df = pd.DataFrame(data_1)
The output should be something related to the output of df2.
output = {'ID': ['001', '003', '001','002','002','002'], 'Name': ["XX1", "XX3", "XX1", "XX2", "XX2", "XX2"],"Number": [1, 1, 2, 1, 2,3]}
df2 = pd.DataFrame(output)
How can I autoincrement the "Number" on duplicated "ID"?
CodePudding user response:
As per @Barmar's response, this seems to have answered it. More details are found here
df["rank"] = df.groupby("ID").rank("first", ascending=False)
CodePudding user response:
You can use groupby
cumcount
(adding 1 as the counts are from 0):
df['Number'] = df.groupby('ID').cumcount().add(1)
Output:
ID Name Number
0 001 XX1 1
1 003 XX3 1
2 001 XX1 2
3 002 XX2 1
4 002 XX2 2
5 002 XX2 3