Home > Enterprise >  Remove duplicates from dataframe but keep the values of other dataframe columns
Remove duplicates from dataframe but keep the values of other dataframe columns

Time:11-13

Is the following dataframe

import numpy as np
import pandas as pd

df = pd.DataFrame([[1001, 120,np.nan], [1001,np.nan ,30], [1004, 160,np.nan],[1005, 160,np.nan], 
                   [1006,np.nan ,8], [1010, 160,np.nan],[1010,np.nan ,4]], columns=['CustomerNr','Period1','Period2'])
CustomerNr Period1 Period2
0 1001 120.0 NaN
1 1001 NaN 30.0
2 1004 160.0 NaN
3 1005 160.0 NaN
4 1006 NaN 8.0
5 1010 NaN 4.0
6 1010 160.0 NaN

and i need to generate this where actually duplicated CustomerNr are eliminated but the values of Period1 and Period 2 are kept.

CustomerNr Period1 Period2
0 1001 120.0 30.0
1 1004 160.0 NaN
2 1005 160.0 NaN
3 1006 NaN 8.0
4 1010 160.0 4

CodePudding user response:

df.groupby('CustomerNr').agg('min')

CodePudding user response:

You can groupby and take the first item per group, by default the NaNs are ignored in the groupby operations:

df.groupby('CustomerNr').first()

output:

             Period1   Period2
CustomerNr                    
1001        120.0000   30.0000
1004        160.0000       NaN
1005        160.0000       NaN
1006             NaN    8.0000
1010        160.0000    4.0000
  • Related