Home > OS >  Pandas: for matching row indices - update dataframe values with values from other dataframe with a d
Pandas: for matching row indices - update dataframe values with values from other dataframe with a d

Time:01-21

I'm struggling with updating values from a dataframe with values from another dataframe using the row index as key. Dataframes are not identical in terms of number of columns so updating can only occur for matching columns. Using the code below it would mean that df3 yields the same result as df4. However df3 returns a None object.

Anyone who can put me in the right direction? It doesn't seem very complicated but I can't seem to get it right

ps. In reality the 2 dataframes are a lot larger than the ones in this example (both in terms of rows and columns)

import pandas as pd


data1 = {'A': [1, 2, 3,4],'B': [4, 5, 6,7],'C':[7,8,9,10]}
df1 = pd.DataFrame(data1,index=['I_1','I_2','I_3','I_4'])
print(df1)

data2 = {'A': [10, 40], 'B': [40, 70]}
df2 = pd.DataFrame(data2 ,index=['I_1','I_4'])
print(df2)

df3 = df1.update(df2)
print(df3)

data4 =  {'A': [10, 2, 3,40],'B': [40, 5, 6,70],'C':[7,8,9,10]}
df4 = pd.DataFrame(data4 ,index=['I_1','I_2','I_3','I_4'])
print(df4)
```

CodePudding user response:

pandas.DataFrame.update returns None. The method directly changes calling object.

source: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.update.html

for your example this means two things.

  • update returns none. hence df3=none
  • df1 got changed when df3 = df1.update(df2) gets called. In your case df1 would look like df4 from that point on.

to write df3 and leave df1 untouched this could be done:

import pandas as pd

data1 = {'A': [1, 2, 3,4],'B': [4, 5, 6,7],'C':[7,8,9,10]}
df1 = pd.DataFrame(data1,index=['I_1','I_2','I_3','I_4'])
print(df1)

data2 = {'A': [10, 40], 'B': [40, 70]}
df2 = pd.DataFrame(data2 ,index=['I_1','I_4'])
print(df2)

#using deep=False if df1 should not get affected by the update method
df3 = df1.copy(deep=False) 
df3.update(df2)
print(df3)

data4 =  {'A': [10, 2, 3,40],'B': [40, 5, 6,70],'C':[7,8,9,10]}
df4 = pd.DataFrame(data4 ,index=['I_1','I_2','I_3','I_4'])
print(df4)
  • Related