Simplified dfs:
df = pd.DataFrame(
{
"ID": [6, 2, 4],
"ID2": [2, 3, 4],
"foo": [2, 3, 4],
}
)
df2 = pd.DataFrame(
{
"ID": [6, 2, 4],
"ID2": [1, 3, 2],
"foo": [2, 3, 4],
"to_ignore": ["idk", "bar", "whatever"],
"A": ["unwanted str", "desired str", "unwanted str"],
}
)
We want to add a column to df
which has values from df2
's A
column where values from columns ID
and ID2
match like so:
ID ID2 NewCol
0 6 2 NaN
1 2 3 desired str
2 4 4 NaN
We can do this where one column's values match:
df["NewCol"] = df["ID"].map(df2.drop_duplicates("ID").set_index("ID")["A"])
How can we make this only work for entries where both ID
and ID2
match? df["NewCol"] = df["ID", "ID2"].map(df2.drop_duplicates("ID").set_index(["ID", "ID2"])["A"])
obviously doesn't work.
We could concatenate the ID cols and crudely use this above approach while risking concatenated ID strs incorrectly matching:
for d in [df, df2]:
d['lookup_combined'] = d["ID"] d["ID2"]
df["NewCol"] = df["lookup_combined"].map(df2.drop_duplicates("lookup_combined").set_index("lookup_combined")["A"])
for d in [df, df2]:
d.drop(columns="lookup_combined", inplace=True)
But this is clunky and error prone. Can we do better?
Note we don't want to merge / join the tables as there are more columns and other complications removed for question simplicity. We only want a new column named by a string we supply.
CodePudding user response:
You are looking for the function merge. You just need to indicate the columns on which you are merging on as follows:
df.merge(df2, how='left', on = ['ID', 'ID2'])
ID ID2 A
0 6 2 NaN
1 2 3 desired str
2 4 4 NaN
Edit:
I still do not see where merge
is an issue: with new data, just do:
vars = ['ID', 'ID2']
df[vars].merge(df2[vars ['A']], how='left', on =vars)
ID ID2 A
0 6 2 NaN
1 2 3 desired str
2 4 4 NaN