Home > Back-end >  Updating a column with another column's values in python but only NaN values
Updating a column with another column's values in python but only NaN values

Time:07-28

Hi I'm trying to use the .update() function to update the values of one column (B) with values from another column (C) in the same dataframe. However, I would like to only replace the values in column B that are NaN. I have tried using the overwrite = False argument but I keep getting an error:

df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [400, np.nan, 600],
                   'C': [32,54,300]})
df['B'].update(df['C'], overwrite = False) 
df

#Output:
#TypeError: update() got an unexpected keyword argument 'overwrite'

#The intended output I'm looking for is this: 
    A   B       C
0   1   400.0   32
1   2   54.0    54
2   3   600.0   300

However, when I use the update.() function to update the whole dataset, the overwrite argument works:

df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [400, np.nan, 600],
                   'C': [32,54,300]})
new_df = pd.DataFrame({'B': [4, 5, 6],
                       'C': [7, 8, 9]})
df.update(new_df, overwrite = False) 
df

#Output: 
    A   B       C
0   1   400.0   32
1   2   5.0     54
2   3   600.0   300

Does the overwrite argument just not work when updating a column with another column in the same dataframe? Would like to check how I might go about updating a column with another column's values but only overwriting values that are NaN. Thank you so much!

CodePudding user response:

To answer your question and this is pretty important. pd.DataFrame.update is not the same as pd.Series.update. There is no overwrite parameter in pd.Series.update. You must know the object type you are working with.

Let's use fillna:

df['B'] = df['B'].fillna(df['C'])

Output:

   A      B    C
0  1  400.0   32
1  2   54.0   54
2  3  600.0  300

Or, mask, or where, or a few other ways... combine_first.

CodePudding user response:

As mentioned in comment only in pandas.DataFrame.update we have overwrite parameter and df['B'].update return a pandas.Series.update and we can not use overwrite.

You can use pandas.mask or numpy.where.

msk = df['B'].isnull()
df['B'] = df['B'].mask(msk, df['C'])
# Or
df['B'] = np.where(msk, df['C'], df['B'])
print(df)

   A      B    C
0  1  400.0   32
1  2   54.0   54
2  3  600.0  300

CodePudding user response:

You can use rename for that:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [400, np.nan, 600], 'C': [32, 54, 300]})

df.update(df.rename(columns={'B':'C', 'C':'B'}), overwrite = False) 

print(df)

Output:

   A      B    C
0  1  400.0   32
1  2   54.0   54
2  3  600.0  300
  • Related