I have a somewhat large array (~3000 rows) where the first column has a duplicate string values that are of varying numbers. I want to be able to remove these duplicates without shifting the cells in this column.
Input
row/rack shelf tilt
row1.rack1 B 5
row1.rack1 A nan
row1.rack2 C nan
row1.rack2 B nan
row1.rack2 A 17
Desired Output
row/rack shelf tilt
row1.rack1 B 5
A nan
row1.rack2 C nan
B nan
A 17
Is there a good way to do this? I've been searching through stackoverflow and other sites but haven't been able to find something like this
CodePudding user response:
using .duplicated
and .loc
df.loc[df['row/rack'].duplicated(keep='first'),'row/rack'] = ''
print(df)
row/rack shelf tilt
0 row1.rack1 B 5.0
1 A NaN
2 row1.rack2 C NaN
3 B NaN
4 A 17.0
CodePudding user response:
mask
the duplicates with empty strings:
df["row/rack"] = df["row/rack"].mask(df["row/rack"].duplicated(), "")
>>> df
row/rack shelf tilt
0 row1.rack1 B 5.0
1 A NaN
2 row1.rack2 C NaN
3 B NaN
4 A 17.0