Home > Enterprise >  How to get common index of many dataframes and return their rows and dataframes they are from?
How to get common index of many dataframes and return their rows and dataframes they are from?

Time:10-04

I have got 7 dataframes with same column names in python, but row indices are largely different and only few of the row indices match.

df1 
      col1 col2 col3
a     123  456  786 
b     121  454  787
c     111  444  777

df2 
      col1 col2 col3
d     13  46    86
e     11  44    87
c     11  44    77

df3 
      col1  col2  col3
d     1231  4567  1786
g     1214  4546  1787
h     1115  4445  1777
c     12    12    10

Answer should be:

newdf
newcol col1  col2  col3  dfcol
c      111   444   777   1
c      11    44    77    2
c      12    12    10    3
d      13    46    86    2
d      1231  4567  1786  3

It is similar to this How to get the common index of two pandas dataframes? but not exactly.

CodePudding user response:

IIUC

This is my beginner way of doing it.

import pandas as pd

Create dataframe

df1 = pd.DataFrame({
  'id': ['a', 'b', 'c'],
  'col1': [123, 121, 111],
  'col2': [456, 454, 444],
  'col3': [786, 787, 777],
})

df2 = pd.DataFrame({
  'id': ['d', 'e', 'c'],
  'col1': [13, 46, 86],
  'col2': [11, 44, 87],
  'col3': [11, 44, 77],
})

df3 = pd.DataFrame({
  'id': ['d', 'g', 'h', 'c'],
  'col1': [1231, 1214, 1115, 12],
  'col2': [4567, 4546, 4445, 12],
  'col3': [1786, 1787, 1777, 10],
})

Create dataframe list as well as giving name to each dataframe

df_collection = [df1, df2, df3]
df1.name = 1
df2.name = 2
df3.name = 3

Create list:

id_col = [[df.loc[i, 'id'], df.loc[i, 'col1'], df.loc[i, 'col2'], df.loc[i, 'col3'], df.name] for df in df_collection for i in range(len(df))]

Construct new dataframe:

df = pd.DataFrame(id_col)
df.columns = ['id', 'col1', 'col2', 'col3', 'df']

Remove rows with only 1 occurence:

new_df = df[df.groupby('id').id.transform('count') > 1].sort_values('id')

This will give us the following:

    id  col1   col2   col3  df
2   c   111    444    777   1
5   c   86     87     77    2
9   c   12     12     10    3
3   d   13     11     11    2
6   d   1231   4567   1786  3
  • Related