Hi I'm trying to use the .update() function to update the values of one column (B) with values from another column (C) in the same dataframe. However, I would like to only replace the values in column B that are NaN. I have tried using the overwrite = False argument but I keep getting an error:
df = pd.DataFrame({'A': [1, 2, 3],
'B': [400, np.nan, 600],
'C': [32,54,300]})
df['B'].update(df['C'], overwrite = False)
df
#Output:
#TypeError: update() got an unexpected keyword argument 'overwrite'
#The intended output I'm looking for is this:
A B C
0 1 400.0 32
1 2 54.0 54
2 3 600.0 300
However, when I use the update.() function to update the whole dataset, the overwrite argument works:
df = pd.DataFrame({'A': [1, 2, 3],
'B': [400, np.nan, 600],
'C': [32,54,300]})
new_df = pd.DataFrame({'B': [4, 5, 6],
'C': [7, 8, 9]})
df.update(new_df, overwrite = False)
df
#Output:
A B C
0 1 400.0 32
1 2 5.0 54
2 3 600.0 300
Does the overwrite argument just not work when updating a column with another column in the same dataframe? Would like to check how I might go about updating a column with another column's values but only overwriting values that are NaN. Thank you so much!
CodePudding user response:
To answer your question and this is pretty important. pd.DataFrame.update
is not the same as pd.Series.update
. There is no overwrite
parameter in pd.Series.update. You must know the object type you are working with.
Let's use fillna
:
df['B'] = df['B'].fillna(df['C'])
Output:
A B C
0 1 400.0 32
1 2 54.0 54
2 3 600.0 300
Or, mask, or where, or a few other ways... combine_first.
CodePudding user response:
As mentioned in comment only in pandas.DataFrame.update
we have overwrite
parameter and df['B'].update
return a pandas.Series.update
and we can not use overwrite
.
You can use pandas.mask
or numpy.where
.
msk = df['B'].isnull()
df['B'] = df['B'].mask(msk, df['C'])
# Or
df['B'] = np.where(msk, df['C'], df['B'])
print(df)
A B C
0 1 400.0 32
1 2 54.0 54
2 3 600.0 300
CodePudding user response:
You can use rename
for that:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [400, np.nan, 600], 'C': [32, 54, 300]})
df.update(df.rename(columns={'B':'C', 'C':'B'}), overwrite = False)
print(df)
Output:
A B C
0 1 400.0 32
1 2 54.0 54
2 3 600.0 300