Home > Software engineering >  Updating filtered data frame in pandas
Updating filtered data frame in pandas

Time:12-14

Can't find why updating filtered data frames are not working. The code is also not returning any error message. I'd be grateful for hints, help.

So the problem comes when i want to update the dataframe but only to given selection. Given .update function on data frame objects updates the data based on index from 1 data set based on another. But it does not do anything when applied to filtered dataframe.

Sample data:

df_1

index   Name    Surname 
R222    Katrin  Johnes      
R343    John    Doe
R377    Steven  Walkins 
R914    NaN NaN

df_2

index   Name    Surname 
R222    Pablo   Picasso     
R343    Jarque  Berry
R377    Christofer  Bishop
R914    Marie   Sklodowska-Curie

Code:

df_1.update(df_2, overwrite = False) 

Returns:

df_1

index   Name    Surname 
R222    Katrin  Johnes      
R343    John    Doe
R377    Steven  Walkins 
R914    Marie   Sklodowska-Curie

While below code:

df_1[(df_1["Name"].notna()) & (df_1["Surname"].notna())].update(df_2, overwrite = False) #not working

Does not apply any updates to given data.frame.
Return: df_1

index   Name    Surname 
R222    Katrin  Johnes      
R343    John    Doe
R377    Steven  Walkins 
R914    NaN NaN 

Looking for help on solving and why is this happening like so. Thanks!

CodePudding user response:

EDIT: If need replace only missing values by another DataFrame use DataFrame.fillna or DataFrame.combine_first:

df = df_1.fillna(df_2)
#alternative   
#df = df_1.combine_first(df_2)

print (df)
         Name           Surname
index                          
R222   Katrin            Johnes
R343     John               Doe
R377   Steven           Walkins
R914    Marie  Sklodowska-Curie

It not working, because update subset of DataFrame inplace, possible ugly solution is update filtered DataFrame df and add not matched original rows:

m = (df_1["Name"].notna()) & (df_1["Surname"].notna())
df = df_1[m].copy()

df.update(df_2)

df = pd.concat([df, df_1[~m]]).sort_index()
print (df)
             Name  Surname
index                     
R222        Pablo  Picasso
R343       Jarque    Berry
R377   Christofer   Bishop
R914          NaN      NaN

Possible solution without update:

m = (df_1["Name"].notna()) & (df_1["Surname"].notna())

df_1[m] = df_2
print (df_1)
             Name  Surname
index                     
R222        Pablo  Picasso
R343       Jarque    Berry
R377   Christofer   Bishop
R914          NaN      NaN

CodePudding user response:

update apply modifications in place so if you select a subset of your dataframe, only the subset will be modified and not your original dataframe.

Use mask:

df1.update(df2.mask(df1.isna().any(1)))
print(df1)

# Output:
             Name  Surname
index                     
R222        Pablo  Picasso
R343       Jarque    Berry
R377   Christofer   Bishop
R914          NaN      NaN
  • Related