I would like to in a single "chained" statement merge two tables, and label in the resulting table those entries from the left table vs right table. Pandas defaults this to "left_only", "right_only", "both" and can be seen by invoking indicator = True in the merge statement.However, I would like to change these to something more helpful via method chaining in Pandas.
This is what I am currently doing - rather inelegantly- without chaining of the merge result.
oran = pd.DataFrame(columns = ['Name', 'type'], data = [['oranger', 'FP'],
['oranged', 'CP'], ['orangeo', 'CP'], ['orangel', 'CP'], ['orangey', 'NP']])
other = pd.DataFrame(columns = ['Name', 'type'], data = [['appler', 'FP'],
['appled', 'CP'], ['appleo', 'CP'], ['orangel', 'CP'], ['orangey', 'NP']])
# Merge two dataframe above; add indicator column for table source
df = pd.merge(oran['Name'], other['Name'],on = 'Name', how = 'outer', indicator = True)
dict_map = {'both':'Both','left_only':'oran only',
'right_only':'other only'}
df["indicator"] = df["_merge"].map(dict_map).
df.drop("_merge", axis = 1, inplace = True)
df
The above approach gives me the output desired. However, the code involves multiple steps. How can I generate the same output as shown below via method chaining in Pandas?
CodePudding user response:
Setting directly the correct name for "indicator" and a simple assign
would work here:
df = (pd.merge(oran['Name'], other['Name'], on='Name', how='outer', indicator='indicator')
.assign(indicator=lambda d: d['indicator'].map(dict_map))
)
If we imagine that setting up the name was not possible, we could drop and return the column with pop
:
df = (pd.merge(oran['Name'], other['Name'], on='Name', how='outer', indicator=True)
.assign(indicator=lambda d: d.pop('_merge').map(dict_map))
)
Output:
Name indicator
0 oranger oran only
1 oranged oran only
2 orangeo oran only
3 orangel Both
4 orangey Both
5 appler other only
6 appled other only
7 appleo other only