I have this dataframe that have duplicate column name, I want to remove the remove the repeated column but I need to keep the values.
I want to remove the C and D column at the end but move the values on the same row in the first C and D column.
df = df.loc[:,~df.columns.duplicated(keep='first')]
Tried this code but it remove the duplicate column and keeping the first but it also remove the values
CodePudding user response:
Example
make minimal and reproducible example for answer
data = [[0, 1, 2, 3, None, None],
[1, None, 3, None, 2, 4],
[2, 3, 4, 5, None, None]]
df = pd.DataFrame(data, columns=list('ABCDBD'))
df
A B C D B D
0 0 1.0 2 3.0 NaN NaN
1 1 NaN 3 NaN 2.0 4.0
2 2 3.0 4 5.0 NaN NaN
Code
df.groupby(level=0, axis=1).first()
result:
A B C D
0 0.0 1.0 2.0 3.0
1 1.0 2.0 3.0 4.0
2 2.0 3.0 4.0 5.0