Home > Blockchain >  Pandas join record by matching on at least one column
Pandas join record by matching on at least one column

Time:11-01

I have the following dataframe:

id          phone       email
10352897        
10352897    10225967    
10352897                [email protected]
10352897    10225967    [email protected]
            10225967    
            10225967    [email protected]
                        [email protected]
23578910        
23578910    38256789    
23578910                [email protected]
23578910    38256789    [email protected]
            38256789    
            38256789    [email protected]
                        [email protected]

I would like that when it coincides in at least one of the three columns, the record joins and prioritizes the filled fields over the empty ones, in the end in this example I would expect the following output:

id          phone       email
10352897    10225967    [email protected]
23578910    38256789    [email protected]

How would you go about doing this?

CodePudding user response:

Use pandas.core.groupby.DataFrameGroupBy.aggregate as follows

df_new = df.groupby('id').agg(lambda x: x.value_counts().index[0]).reset_index()

[Out]:

         id     phone            email
0  10352897  10225967   [email protected]
1  23578910  38256789  [email protected]
  • Related