Home > Mobile >  How to fill NaN in a column based on the values in another column?
How to fill NaN in a column based on the values in another column?

Time:09-27

I am working on this project where I need to fill a NaN cell from the reason column, but I need to cross-check with the status column first and fill in the appropriate values.

The condition is:

  • If STATUS is "Invoiced" and REASON is NaN, then fill with "Not Applicable"
  • If STATUS is "Lost Sale" and REASON is NaN, then fill with "Unknown"

I have tried this code:

data2 = np.where((data["SO_STATUS"] == "Invoiced")&(data["SO_LOSREASON"]=='NaN'),'null',"Not Applicable")
data2 = pd.DataFrame(data2, columns = ['SO_STATUS'])
data= data.join(data2)

However, it says:

ValueError: columns overlap but no suffix specified: Index(['SO_STATUS'], dtype='object')

I am not sure if what I did is correct or not since I still need the other data from reason column that are not NaN. Thank you so much for helping!

CodePudding user response:

Try using direct column assignment:

data['SO_STATUS'] = np.where((data["SO_STATUS"] == "Invoiced") & data["SO_LOSREASON"].isna(),, "Not Applicable", "Unknown")

CodePudding user response:

You can use map and where, thus it is easy to add any other condition:

data['new_column'] = (data['SO_STATUS'].map({'Invoiced': 'Not Applicable',
                                             'Lost Sale': 'Unknown',
                                             # add other here: 'Other': 'other',
                                            })
                                       .where(data['SO_LOSREASON'].isna())
                      )

example (setting a new column new_column for clarity):

   SO_STATUS SO_LOSREASON      new_column
0   Invoiced          NaN  Not Applicable
1  Lost Sale          NaN         Unknown
2      Other          NaN             NaN
3   Invoiced          abc             NaN
  • Related