I have the following dataframe:
id item item_cost order_total
1 A 6 10
1 B 4 NaN
2 A 5 5
3 C 12 12
There are duplicate keys (column 'id') which relate to a specific order. order_total is a sum of each item_cost with the same id. I would now like to duplicate the order_total into each row of the same order. E.g. both rows with id = 1 should have an order_total of 10. One of them has NaN.
This dataframe is simply read in from a csv so I have done no calculations on any of these columns.
The simplified logic I am trying to achieve is: if column id is a duplicate, fill NaN values with the non-NaN value from a row with the same id.
I have tried the following code:
print(df.groupby('id',as_index=False).sum())
However, the issue here is that I lose the item name which I need to use to perform further analysis.
CodePudding user response:
Try this:
df['order_total'] = df.groupby('id').order_total.transform('first')
print(df)
id item item_cost order_total
0 1 A 6 10.0
1 1 B 4 10.0
2 2 A 5 5.0
3 3 C 12 12.0