Home > Software design >  Pandas how to replace NaN in rows with duplicate keys
Pandas how to replace NaN in rows with duplicate keys

Time:11-03

I have the following dataframe:

  id  item  item_cost order_total
  1   A         6          10
  1   B         4          NaN
  2   A         5          5
  3   C         12         12

There are duplicate keys (column 'id') which relate to a specific order. order_total is a sum of each item_cost with the same id. I would now like to duplicate the order_total into each row of the same order. E.g. both rows with id = 1 should have an order_total of 10. One of them has NaN.

This dataframe is simply read in from a csv so I have done no calculations on any of these columns.

The simplified logic I am trying to achieve is: if column id is a duplicate, fill NaN values with the non-NaN value from a row with the same id.

I have tried the following code:

print(df.groupby('id',as_index=False).sum())

However, the issue here is that I lose the item name which I need to use to perform further analysis.

CodePudding user response:

Try this:

df['order_total'] = df.groupby('id').order_total.transform('first')

print(df)

   id item  item_cost  order_total
0   1    A          6         10.0
1   1    B          4         10.0
2   2    A          5          5.0
3   3    C         12         12.0
  • Related