Home > database >  pandas merged multiple dataframes of different size and columns
pandas merged multiple dataframes of different size and columns

Time:04-15

I have many dataframes from 0 to N that need to merge. They all has one common column called 'NAME'. I tried different ways on merge and concat but dataframes doesn't return the format I need

These are some samples of my data

d1 = {'col1': [1,1], 'col2': [1,1], 'NAME': ["A","A"]}
d2 = {'col3': [1,1], 'col4': [1,1], 'NAME': ["A","A"]}
d3 = {'col1': [2,2], 'col2': [2,2], 'NAME': ["B","B"]}
d4 = {'col3': [2,2], 'col4': [2,2], 'NAME': ["B","B"]}
d5 = {'col5': []}
d6 = {'col5': []}

df_list = [d1, d2, d3, d4, d5, d6]

test_df = []
for item in df_list:
    temp_df = pd.DataFrame(data=item)
    test_df.append(temp_df)

merged = reduce(lambda left, right: pd.merge(left, right, on="NAME", how='outer'),test_df)

The above code also give me an error: KeyError: 'NAME'

I expect my merged result to be

   col1 col2 col3 col4 NAME
0     1    1    1    1    A
1     1    1    1    1    A
0     2    2    2    2    B
1     2    2    2    2    B

CodePudding user response:

Assuming unique NAME values per dataframe, I would use pandas.concat here:

(pd
 .concat(test_df).rename_axis('index')
 .groupby(['NAME', 'index']).first()
 .reset_index('NAME')
)

output:

      NAME  col1  col2  col3  col4
index                             
0        A   1.0   1.0   1.0   1.0
1        A   1.0   1.0   1.0   1.0
0        B   2.0   2.0   2.0   2.0
1        B   2.0   2.0   2.0   2.0

CodePudding user response:

d1 = {'col1': [1,1], 'col2': [1,1], 'NAME': ["A","A"]}
d2 = {'col3': [1,1], 'col4': [1,1], 'NAME': ["A","A"]}
d3 = {'col1': [2,2], 'col2': [2,2], 'NAME': ["B","B"]}
d4 = {'col3': [2,2], 'col4': [2,2], 'NAME': ["B","B"]}
d5 = {'col5': []}
d6 = {'col5': []}

df_list = [d1, d2, d3, d4]

test_df = []
for item in df_list:
    temp_df = pd.DataFrame(data=item)
    test_df.append(temp_df)

merged = reduce(lambda left, right: pd.merge(left, right, on="NAME", how='outer'),test_df)
merged["col5"] = None
col1_x col2_x NAME col3_x col4_x col1_y col2_y col3_y col4_y col5
0 1 1 A 1 1 nan nan nan nan
1 1 1 A 1 1 nan nan nan nan
2 1 1 A 1 1 nan nan nan nan
3 1 1 A 1 1 nan nan nan nan
4 nan nan B nan nan 2 2 2 2
5 nan nan B nan nan 2 2 2 2
6 nan nan B nan nan 2 2 2 2
7 nan nan B nan nan 2 2 2 2

pandas.DataFrame.merge

If you just want to expand one column, then you can add a separate column without merge

KeyError: 'NAME'

  • Related