Have the following dataframe:
Sometimes the index is duplicate and then I want to change the value in the column 'Hotspot'. So prograding_feature_polygon_30 should be changed to prograding_feature_polygon_30_1 and the second on index 0 to prograding_feature_polygon_30_2.
The same on index 1, so again the values should be changed to prograding_feature_polygon_30_1 and prograding_feature_polygon_30_2. And so on...
Indexes are not always duplicate and if not, the value in Hotspot should remain the same. Anyone knows an easy way to do this?
Regards,
Dante
CodePudding user response:
Data sample
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': np.repeat([*'ABCD'],[2,1,3,1]),
'b': [*range(7)]},
index=np.repeat([*range(4)],[2,1,3,1]))
print(df)
a b
0 A 0
0 A 1
1 B 2
2 C 3
2 C 4
2 C 5
3 D 6
Problem
For each duplicate in the index, we want to add a consecutive number to the values in column a
. So, A_1, A_2
for index value 0
, and C_1, C_2, C_3
for index value 2
. Values without duplicates (1
and 3
) should be unaffected.
Solution
df.a = np.where(df.index.duplicated(keep=False),
df.a '_' df.groupby(level=0).cumcount().add(1).astype(str),
df.a)
print(df)
a b
0 A_1 0
0 A_2 1
1 B 2
2 C_1 3
2 C_2 4
2 C_3 5
3 D 6
Explanation
- Use
df.index.duplicated
with paramkeep=False
to get an array withTrue
for duplicates,False
for non-duplicates. - Use this array inside
np.where
. IfTrue
, we wantdf.a consecutive number
, else simplydf.a
. - Use
df.groupby
on the index, and apply.cumcount
to enumerate items per group.add(1)
to start at1
, instead of0
. Finally, useastype(str)
, in view of the concatenation withdf.a
.