I'm trying to merge one column values from df2
to df1
. df1.merge(df2, how='outer')
seems to be what I needed but result is not what I wanted because of duplicate. Using 'on' introduces _x
and _y
which I don't want either.
In below Example: sub=site1
in both df1
and df2
is same, then 'fred'
from df2
replaces 'own'
of df1
.
# Pandas Merge test:
import pandas as pd
df1 = pd.DataFrame({'sub': ['site1', 'site2', 'site3'], 'iss': ['enc1', 'enc2', 'enc3'], 'rem': [1, 3, 5], 'own': ['andy', 'brian', 'cody']})
df2 = pd.DataFrame({'sub': ['data1', 'data2', 'site1'], 'rem': [2, 4, 6], 'own': ['david', 'edger', 'fred']})
>>> df1
sub iss rem own
0 site1 enc1 1 andy
1 site2 enc2 3 brian
2 site3 enc3 5 cody
>>> df2
sub rem own
0 data1 2 david
1 data2 4 edger
2 site1 6 fred
>>> df1.merge(df2, how='outer')
sub iss rem own
0 site1 enc1 1 andy
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
5 site1 NaN 6 fred
>>> df1.merge(df2, on='sub', how='outer')
sub iss rem_x own_x rem_y own_y
0 site1 enc1 1.0 andy 6.0 fred
1 site2 enc2 3.0 brian NaN NaN
2 site3 enc3 5.0 cody NaN NaN
3 data1 NaN NaN NaN 2.0 david
4 data2 NaN NaN NaN 4.0 edger
Expected Output:
sub iss rem own
0 site1 enc1 1 fred
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
CodePudding user response:
A potential somewhat simple solution using pd.concat
and loc
to filter df1 to just contain records not present in df2 and then concat them together.
# used to make use loc on index as it is a bit simpler.
df1 = df1.set_index('sub')
df2 = df2.set_index('sub')
Then pd.concat
them together.
df3 = pd.concat([df1[~df1.index.isin(df2.index)],df2])
Output:
print(df3)
iss rem own
sub
site2 enc2 3 brian
site3 enc3 5 cody
data1 NaN 2 david
data2 NaN 4 edger
site1 NaN 6 fred
This does not change the value of rem
and iss
for site1 to equal the value of df1
though.
If that is also needed you would you could just add an additional loc
statement as a possible solution. Like this:
df3.loc[(df3.index.isin(df1.index.to_list())) & ~(df3['rem'].isin(df1['rem'].to_list())), ['iss','rem']] = df1[['iss','rem']]
Final Output
iss rem own
sub
site2 enc2 3 brian
site3 enc3 5 cody
data1 NaN 2 david
data2 NaN 4 edger
site1 enc1 1 fred
CodePudding user response:
Edit: changed to using update instead of fillna as per @bkeesey's comment
you need to merge on sub then update the new columns and drop the old ones
try
import pandas as pd
df1 = pd.DataFrame({'sub': ['site1', 'site2', 'site3'], 'iss': ['enc1', 'enc2', 'enc3'], 'rem': [1, 3, 5], 'own': ['andy', 'brian', 'cody']})
df2 = pd.DataFrame({'sub': ['data1', 'data2', 'site1'], 'rem': [2, 4, 6], 'own': ['david', 'edger', 'fred']})
dfm = df1.merge(df2, on='sub', how='outer', suffixes=["_x",""])
dfm.own.update(dfm.own_x)
dfm.rem.update(dfm.rem_x)
del dfm["own_x"]
del dfm["rem_x"]
result
sub iss rem own
0 site1 enc1 6.0 fred
1 site2 enc2 3.0 brian
2 site3 enc3 5.0 cody
3 data1 NaN 2.0 david
4 data2 NaN 4.0 edger
CodePudding user response:
here is one way to do it
# update the df1.own with the values for it in the df2
# using map
df1['own'] = df1['sub'].map(df2.set_index('sub')['own']).fillna(df1['own'])
out=(pd.concat([df1, df2]) # concat the two DF
.drop_duplicates(subset=['sub']) # drop duplicates
.reset_index() # reset index
.drop(columns='index')) # remove the unwanted column
out
sub iss rem own
0 site1 enc1 1 fred
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger
alternately,
# merge the two DF, and drop the duplicates
out=(pd.concat([df1, df2])
.drop_duplicates(subset=['sub'])
.reset_index()
.drop(columns='index'))
# map the own in the resulting DF from concat
out['own'] = out['sub'].map(df2.set_index('sub')['own']).fillna(out['own'])
out
sub iss rem own
0 site1 enc1 1 fred
1 site2 enc2 3 brian
2 site3 enc3 5 cody
3 data1 NaN 2 david
4 data2 NaN 4 edger