Home > Back-end >  How do I merge different Dataframes and label each dataframe in the merged dataframe in python?
How do I merge different Dataframes and label each dataframe in the merged dataframe in python?

Time:11-29

lets say for example we have 2 Dataframes, df1 and df2;

df1 = pd.DataFrame({'id': ['A01', 'A02'],
                    'Name': ['ABC', 'PQR']})
df2 = pd.DataFrame({'id': ['B05', 'B06'],
                    'Name': ['XYZ', 'TUV']})

I want to merge the two and label each dataframes, so it appears like this.

enter image dwescription here

So basically, i want to concatenate two dataframes into a new dataframe and create a third column that labels each of those dataframes. As seen the the table above, you can see that there is a 3rd column named 'class' and the values there are grouping of each dataframes that were merged. The first two above are data for df1 and it was labelled as 1 for all of them. it groups all of them and put them as one.

i'm trying to make sure it doesn't appear like this one below; enter image description here

in this case, it's appending for each line.. i prefer to append to the whole DF as single entity as shown in the first table.

This is what I have tried;

 df1['class'] = 1
 df2['class'] = 2

 df_merge = pd.concat([df1,df2])

and i got result like this enter image description here

But this is not what I was expecting. I am expecting the result to look like this. Grouping each df as one and add the 3rd column. enter image description here

CodePudding user response:

You can concat using the keys and names parameters, then reset_index:

(pd.concat([df1, df2], keys=[1, 2], names=['class', None])
   .reset_index('class')
)

Output:

   class   id Name
0      1  A01  ABC
1      1  A02  PQR
0      2  B05  XYZ
1      2  B06  TUV

Or without reset_index to get a MultiIndex:

pd.concat([df1, df2], keys=[1, 2], names=['class', None])

          id Name
class            
1     0  A01  ABC
      1  A02  PQR
2     0  B05  XYZ
      1  B06  TUV

hiding the "duplicated" class:

(pd.concat([df1, df2], keys=[1, 2], names=['class', None])
   .reset_index('class')
      .assign(**{'class': lambda d: d['class'].mask(d['class'].duplicated(), '')})
)

Output:

  class   id Name
0     1  A01  ABC
1        A02  PQR
0     2  B05  XYZ
1        B06  TUV
  • Related