I have a dataframe like: the dataframe stores phone number, favorite food, and favorite toy for the kids (signed with different Ids). The data is separate in different rows and columns. Some rows may only has Id but nothing else. Input may looks like:
|Id|phone_number|food |toy |
|--|------------|------|----|
|01| |apple | |
|01|9995552222 |banana| |
|01| | |ball|
|01|9995552222 |orange| |
|02|3332226666 | | |
|02| |boba | |
|02| | | |
What I want to get: I want to combine the value in different rows together to make each row unique. The output may looks like:
|Id|phone_number|food |toy |
|--|------------|------|----|
|01|9995552222 |apple |ball|
|01|9995552222 |banana|ball|
|01|9995552222 |orange|ball|
|02|3332226666 |boba | |
Thank you
test = pd.DataFrame({'Id': ['01', '01', '01', '01', '02', '02', '02'],
'phone_number': ['', '9995552222', '', '9995552222', '3332226666', '', ''],
'food': ['apple', 'banana', '', 'orange', '', 'boba', ''],
'toy ': ['', '', 'ball', '', '', '', '']})
CodePudding user response:
You can try groupby
Id
column then fill the NaN column with bfill
and ffill
. At last drop the duplicates in 'phone_number', 'food', 'toy'.
test = test.replace('', pd.NA)
out = (test.groupby('Id')
.apply(lambda g: g.bfill().ffill())
.drop_duplicates(['phone_number', 'food', 'toy']) # 'toy ' in your given dataframe
.fillna('')
)
print(df)
Id phone_number food toy
0 01 9995552222 apple ball
1 01 9995552222 banana ball
2 01 9995552222 orange ball
4 02 3332226666 boba