Home > Blockchain >  pandas outer join and improve pandas naming of left vs right table entries in resulting join
pandas outer join and improve pandas naming of left vs right table entries in resulting join

Time:01-11

I would like to in a single "chained" statement merge two tables, and label in the resulting table those entries from the left table vs right table. Pandas defaults this to "left_only", "right_only", "both" and can be seen by invoking indicator = True in the merge statement.However, I would like to change these to something more helpful via method chaining in Pandas.

This is what I am currently doing - rather inelegantly- without chaining of the merge result.

   oran = pd.DataFrame(columns = ['Name', 'type'], data = [['oranger', 'FP'],
                       ['oranged', 'CP'], ['orangeo', 'CP'], ['orangel', 'CP'], ['orangey', 'NP']])

   other =  pd.DataFrame(columns = ['Name', 'type'], data = [['appler', 'FP'],
                       ['appled', 'CP'], ['appleo', 'CP'], ['orangel', 'CP'],  ['orangey', 'NP']])


 # Merge two dataframe above; add indicator column for table source
   df = pd.merge(oran['Name'], other['Name'],on = 'Name', how = 'outer', indicator = True)

   dict_map = {'both':'Both','left_only':'oran only',
            'right_only':'other only'}

   df["indicator"] = df["_merge"].map(dict_map).

   df.drop("_merge", axis = 1, inplace = True)

   df

The above approach gives me the output desired. However, the code involves multiple steps. How can I generate the same output as shown below via method chaining in Pandas?

enter image description here

CodePudding user response:

Setting directly the correct name for "indicator" and a simple assign would work here:

df = (pd.merge(oran['Name'], other['Name'], on='Name', how='outer', indicator='indicator')
        .assign(indicator=lambda d: d['indicator'].map(dict_map))
      )

If we imagine that setting up the name was not possible, we could drop and return the column with pop:

df = (pd.merge(oran['Name'], other['Name'], on='Name', how='outer', indicator=True)
        .assign(indicator=lambda d: d.pop('_merge').map(dict_map))
      )

Output:

      Name   indicator
0  oranger   oran only
1  oranged   oran only
2  orangeo   oran only
3  orangel        Both
4  orangey        Both
5   appler  other only
6   appled  other only
7   appleo  other only
  • Related