Home > database >  Copy contents from one Dataframe to another based on column values in Pandas
Copy contents from one Dataframe to another based on column values in Pandas

Time:05-08

Two seperate similar DataFrames with different lengths

df2=

Index First Name Age Gender Weight
0 James 25 Male 155
1 John 27 Male 175
2 Patricia 23 Female 135
3 Mary 22 Female 125
4 Martin 30 Male 185
5 Margaret 29 Female 141
6 Kevin 22 Male 198

df1=

Index First Name Age Gender Weight Height
0 James 25 Male 165 5'10
1 John 27 Male 175 5'9
2 Matthew 29 Male 183 6'0
3 Patricia 23 Female 135 5'3
4 Mary 22 Female 125 5'4
5 Rachel 29 Male 123 5'3
6 Jose 20 Male 175 5'11
7 Kevin 22 Male 192 6'2

df2 has some rows which are not in df1 and df1 has some values which are not in df2.

I am comparing df1 against df2. I have calculated the newentries with the following code

newentries = df2.loc[~df2['First Name'].isin(df1['First Name'])]
deletedentries = df1.loc[~df1['First Name'].isin(df2['First Name'])]

where newentries denote the rows/entries that are there in df2 but not in df1; deletedentries denote the rows/entries that are there in df1 but not in df2. The above code works perfectly fine.

I need to copy the height from df1 to df2 when the first names are equal.

df2.loc[df2['First Name'].isin(df1['First Name']),"Height"] = df1.loc[dfxh2['First Name'].isin(df2['First Name']),"Height"]

The above code copies the values however indexing is causing an issue and the values are not copied to the corresponding rows, I tried to promote First Name as the Index but that doesn't solve the issue. Please help me with a solution

Also, I need to calculate the modified values, if the First Name is same, I need to check for modified values; for example in df1, the weight of James is 155 however in df2 the weight is 165, so I need to store the modified weight of James(165) and index(0) in a new dataframe without iteration; the iteration takes a long time because this is a sample of a big dataframe with a lot more rows and columns.

Desired output:

df2=

Index First Name Age Gender Weight Height
0 James 25 Male 155 5'10
1 John 27 Male 175 5'9
2 Patricia 23 Female 135 5'3
3 Mary 22 Female 125 5'4
4 Martin 30 Male 185
5 Margaret 29 Female 141
6 Kevin 22 Male 198 6'2

Martin's and Margaret's heights are not there in df1, so their heights are not updated in df2

newentries=

Index First Name Age Gender Weight Height
4 Martin 30 Male 185
5 Margaret 29 Female 141

deletedentries=

Index First Name Age Gender Weight Height
2 Matthew 29 Male 183 6'0
5 Rachel 29 Male 123 5'3
6 Jose 20 Male 175 5'11

modval=

Index First Name Age Gender Weight Height
0 James 165
7 Kevin 192

CodePudding user response:

for your desired output for df2 you can try this:

desired_df2 = df2.merge(df1[['First Name','Height']], on='First Name', how='left')
#if you want to change the "NaN" values just add ".fillna(fill_value=0)" for e.g 0 after the merge
print(desired_df2)


  First Name  Age   Gender  Weight Height
0     James    25    Male      155   5'10
1      John    27    Male      175    5'9
2  Patricia    23  Female      135    5'3
3      Mary    22  Female      125    5'4
4    Martin    30    Male      185    NaN
5  Margaret    29  Female      141    NaN
6     Kevin    22    Male      198    6'2

new and deleted entries is already right. for the moment I'm a bit stuck how to get the modval dataframe. I'll update my answer if I get a solution.

CodePudding user response:

Building off of Rabinzel's answer:

output = df2.merge(df1, how='left', on='First Name', suffixes=[None, '_old'])
df3 = output[['First Name', 'Age', 'Gender', 'Weight', 'Height']]

cols = df1.columns[1:-1]
modval = pd.DataFrame()
for col in cols:
    modval = pd.concat([modval, output[['First Name', col   '_old']][output[col] != output[col   '_old']].dropna()])
    modval.rename(columns={col  '_old':col}, inplace=True)

newentries = df2[~df2['First Name'].isin(df1['First Name'])]
deletedentries = df1[~df1['First Name'].isin(df2['First Name'])]

print(df3, newentries, deletedentries, modval, sep='\n\n')

Output:

  First Name  Age  Gender  Weight Height
0      James   25    Male     155   5'10
1       John   27    Male     175    5'9
2   Patricia   23  Female     135    5'3
3       Mary   22  Female     125    5'4
4     Martin   30    Male     185    NaN
5   Margaret   29  Female     141    NaN
6      Kevin   22    Male     198    6'2

  First Name  Age  Gender  Weight
4     Martin   30    Male     185
5   Margaret   29  Female     141

  First Name  Age Gender  Weight Height
2    Matthew   29   Male     183    6'0
5     Rachel   29   Male     123    5'3
6       Jose   20   Male     175   5'11

  First Name  Age Gender  Weight
0      James  NaN    NaN   165.0
6      Kevin  NaN    NaN   192.0
  • Related