Python get value from different rows and columns-CodePudding

I have a dataframe like: the dataframe stores phone number, favorite food, and favorite toy for the kids (signed with different Ids). The data is separate in different rows and columns. Some rows may only has Id but nothing else. Input may looks like:

|Id|phone_number|food  |toy |
|--|------------|------|----|
|01|            |apple |    |
|01|9995552222  |banana|    |
|01|            |      |ball|
|01|9995552222  |orange|    |
|02|3332226666  |      |    |
|02|            |boba  |    |
|02|            |      |    |

What I want to get: I want to combine the value in different rows together to make each row unique. The output may looks like:

|Id|phone_number|food  |toy |
|--|------------|------|----|
|01|9995552222  |apple |ball|
|01|9995552222  |banana|ball|
|01|9995552222  |orange|ball|
|02|3332226666  |boba  |    |

Thank you

test = pd.DataFrame({'Id': ['01', '01', '01', '01', '02', '02', '02'], 
                'phone_number': ['', '9995552222', '', '9995552222', '3332226666', '', ''], 
                'food': ['apple', 'banana', '', 'orange', '', 'boba', ''], 
                'toy ': ['', '', 'ball', '', '', '', '']})

CodePudding user response：

You can try groupby Id column then fill the NaN column with bfill and ffill. At last drop the duplicates in 'phone_number', 'food', 'toy'.

test = test.replace('', pd.NA)

out = (test.groupby('Id')
       .apply(lambda g: g.bfill().ffill())
       .drop_duplicates(['phone_number', 'food', 'toy']) # 'toy ' in your given dataframe
       .fillna('')
       )

print(df)

   Id phone_number    food   toy
0  01   9995552222   apple  ball
1  01   9995552222  banana  ball
2  01   9995552222  orange  ball
4  02   3332226666    boba