My example master dataset is the following:
df1 = pd.DataFrame(
data=
[['dfs','cat', 'giraffe', 'fish'],
['axs','dog', 'pig', 'bird'],
['fgd','cow', 'lion', 'bull'],
['rty','mouse', 'elephant', 'sheep'],
['pwe', 'fox', 'tiger', 'horse']],
columns=['othervar', 'makevar0', 'makevar1', 'makevar2']
)
print(df1)
The other dataset is as follows:
df2 = pd.DataFrame(
data=
[['cat', 34],
['pig', 23],
['cow', 45],
['elephant', 86],
['horse', 90]],
columns=['varx', 'targetvar']
)
print(df2)
I want to merge df1 and df2 like follows:
merged = pd.merge(df1, df2[['varx', 'targetvar']], left_on='makevar0',
right_on='varx',
how='left', validate='m:1')
print(merged)
The issue is I want to do this for all makevar0, makeover1, and makevar2 in the master dataset; but there may be more than 3 such variables. That is the master dataset may have 2,3,4 or any number of makevars (not very large number and certainly below 10). However, all these will be named starting with 'makevar' followed by a number. I want to merge all the makevars with the varx in 2nd dataset to get targetvar mapped in the master data in a loop. The 2nd dataframe does not change at all.
Any help is appreciated so that I don't have to manually check the number of makevars and write the same line of codes many times.
CodePudding user response:
I would suggest using Series.map
to substitute the values in each makevar
column:
# create a mapping series
s = df2.set_index('varx')['targetvar']
# Substitute values in each makevar like column
for c in df1.filter(like='makevar'):
df1[c.replace('make', 'target')] = df1[c].map(s)
Result
othervar makevar0 makevar1 makevar2 targetvar0 targetvar1 targetvar2
0 dfs cat giraffe fish 34.0 NaN NaN
1 axs dog pig bird NaN 23.0 NaN
2 fgd cow lion bull 45.0 NaN NaN
3 rty mouse elephant sheep NaN 86.0 NaN
4 pwe fox tiger horse NaN NaN 90.0