Home > database >  how to eliminate duplicate rows in column A keeping the maximum value in B in python
how to eliminate duplicate rows in column A keeping the maximum value in B in python

Time:04-23

I'm working with data from an excel file like this.

A                  B
2001-05-01 12:30   10
2001-05-01 12:30   20
2001-05-05 11:50   30
2001-05-05 11:50   40
2002-03-22 14:12   10

I'm using this line of code to eliminate the duplicates keeping the maximum

df_clean=df_raw.sort_values('A', ascending=False).drop_duplicates('B').sort_index()

but I'm obtaining this error

Index(['B'], dtype='object')

I don't know which could be the problem since I'm doing it after the upload of the file.

CodePudding user response:

If I can assume that your index is just a RangeIndex then I think what you are looking for is:

df_clean=df_raw.sort_values('A', ascending=False).drop_duplicates('B', ignore_index=True)

and not sort_index()

CodePudding user response:

It seems to me that your second column name contains some spaces before "B" something like:

" B"

Just try :

df_raw.columns = ["A","B"]

before your statement

  • Related