I have a DataFrame with the following data:
size col1 col2
1.5 {'val':1.1, 'id': 10} None
2.0 {'val':1.1, 'id': 11} None
3.0 {'val':1.1, 'id': 20} None
3.0 None {'val':1.1, 'id': 6}
I am trying to merge the rows and remove the None
but when I do any df.groupby(by=['size']).max()
or other it converts the dict values to NaN.
Is there a way to merge these rows and keep the dict values?
Expected Result:
size col1 col2
1.5 {'val':1.1, 'id': 10} None
2.0 {'val':1.1, 'id': 11} None
3.0 {'val':1.1, 'id': 20} {'val':1.1, 'id': 6}
The two (or more) rows sharing size=3.0
are merged and the columns kept.
CodePudding user response:
Try with groupby
with first
out = df.groupby('size').first()#.reset_index()
Update
out = df.replace({'None':np.nan}).groupby('size').first()#.reset_index()
CodePudding user response:
After a lot of reading and testing it looks like pandas groupby agg functions can only work on numeric values. Any non-numeric values get dropped.
My solution required re-framing the whole problem, the new data format is:
size col value
1.5 col1 {'val':1.1, 'id': 10}
2.0 col1 {'val':1.1, 'id': 11}
3.0 col1 {'val':1.1, 'id': 20}
3.0 col2 {'val':1.1, 'id': 6}
With this new data format I was able to use:
df = df.pivot(index='size',columns='col', values='value')
df = df.reset_index()
Using pivot preserves the dict values and since I was able to control the format of the date this solution worked.