Pandas Group by with dict values-CodePudding

I have a DataFrame with the following data:

size                   col1                   col2
1.5    {'val':1.1, 'id': 10}                  None
2.0    {'val':1.1, 'id': 11}                  None
3.0    {'val':1.1, 'id': 20}                  None
3.0                     None   {'val':1.1, 'id': 6}

I am trying to merge the rows and remove the None but when I do any df.groupby(by=['size']).max() or other it converts the dict values to NaN.

Is there a way to merge these rows and keep the dict values?

Expected Result:

size                   col1                   col2
1.5    {'val':1.1, 'id': 10}                  None
2.0    {'val':1.1, 'id': 11}                  None
3.0    {'val':1.1, 'id': 20}   {'val':1.1, 'id': 6}

The two (or more) rows sharing size=3.0 are merged and the columns kept.

CodePudding user response：

Try with groupby with first

out = df.groupby('size').first()#.reset_index()

Update

out = df.replace({'None':np.nan}).groupby('size').first()#.reset_index()

CodePudding user response：

After a lot of reading and testing it looks like pandas groupby agg functions can only work on numeric values. Any non-numeric values get dropped.

My solution required re-framing the whole problem, the new data format is:

size    col                  value
1.5    col1  {'val':1.1, 'id': 10}
2.0    col1  {'val':1.1, 'id': 11}
3.0    col1  {'val':1.1, 'id': 20}
3.0    col2   {'val':1.1, 'id': 6}

With this new data format I was able to use:

df = df.pivot(index='size',columns='col', values='value')
df = df.reset_index()

Using pivot preserves the dict values and since I was able to control the format of the date this solution worked.