Home > Mobile >  Edit columns based on duplicate values found in Pandas
Edit columns based on duplicate values found in Pandas

Time:12-15

I have below dataframe:

No:      Fee:
111      500
111      500
222      300
222      300
123      400

If data in No is duplicate, I want to keep only one fee and remove others. Should look like below:

    No:      Fee:
    111      500
    111      
    222      300
    222      
    123      400

I actually have no idea where to start, so please guide here.

Thanks.

CodePudding user response:

Use DataFrame.duplicated with set empty string by DataFrame.loc:

#if need test duplicated by both columns
mask = df.duplicated(['No','Fee'])

df.loc[mask, 'Fee'] = ''
print (df)
    No  Fee
0  111  500
1  111     
2  222  300
3  222     
4  123  400

But then lost numeric column, because mixed numbers with strings:

print (df['Fee'].dtype)
object

Possible solution is use missing values if need numeric column:

df.loc[mask, 'Fee'] = np.nan
print (df)
    No    Fee
0  111  500.0
1  111    NaN
2  222  300.0
3  222    NaN
4  123  400.0

print (df['Fee'].dtype)
float64

df.loc[mask, 'Fee'] = np.nan

df['Fee'] = df['Fee'].astype('Int64')
print (df)
    No   Fee
0  111   500
1  111  <NA>
2  222   300
3  222  <NA>
4  123   400

print (df['Fee'].dtype)
Int64
  • Related