Can't find why updating filtered data frames are not working. The code is also not returning any error message. I'd be grateful for hints, help.
So the problem comes when i want to update the dataframe but only to given selection. Given .update function on data frame objects updates the data based on index from 1 data set based on another. But it does not do anything when applied to filtered dataframe.
Sample data:
df_1
index Name Surname
R222 Katrin Johnes
R343 John Doe
R377 Steven Walkins
R914 NaN NaN
df_2
index Name Surname
R222 Pablo Picasso
R343 Jarque Berry
R377 Christofer Bishop
R914 Marie Sklodowska-Curie
Code:
df_1.update(df_2, overwrite = False)
Returns:
df_1
index Name Surname
R222 Katrin Johnes
R343 John Doe
R377 Steven Walkins
R914 Marie Sklodowska-Curie
While below code:
df_1[(df_1["Name"].notna()) & (df_1["Surname"].notna())].update(df_2, overwrite = False) #not working
Does not apply any updates to given data.frame.
Return:
df_1
index Name Surname
R222 Katrin Johnes
R343 John Doe
R377 Steven Walkins
R914 NaN NaN
Looking for help on solving and why is this happening like so. Thanks!
CodePudding user response:
EDIT: If need replace only missing values by another DataFrame use DataFrame.fillna
or DataFrame.combine_first
:
df = df_1.fillna(df_2)
#alternative
#df = df_1.combine_first(df_2)
print (df)
Name Surname
index
R222 Katrin Johnes
R343 John Doe
R377 Steven Walkins
R914 Marie Sklodowska-Curie
It not working, because update subset of DataFrame inplace, possible ugly solution is update filtered DataFrame df
and add not matched original rows:
m = (df_1["Name"].notna()) & (df_1["Surname"].notna())
df = df_1[m].copy()
df.update(df_2)
df = pd.concat([df, df_1[~m]]).sort_index()
print (df)
Name Surname
index
R222 Pablo Picasso
R343 Jarque Berry
R377 Christofer Bishop
R914 NaN NaN
Possible solution without update
:
m = (df_1["Name"].notna()) & (df_1["Surname"].notna())
df_1[m] = df_2
print (df_1)
Name Surname
index
R222 Pablo Picasso
R343 Jarque Berry
R377 Christofer Bishop
R914 NaN NaN
CodePudding user response:
update
apply modifications in place so if you select a subset of your dataframe, only the subset will be modified and not your original dataframe.
Use mask
:
df1.update(df2.mask(df1.isna().any(1)))
print(df1)
# Output:
Name Surname
index
R222 Pablo Picasso
R343 Jarque Berry
R377 Christofer Bishop
R914 NaN NaN