I have many dataframes from 0 to N that need to merge. They all has one common column called 'NAME'. I tried different ways on merge and concat but dataframes doesn't return the format I need
These are some samples of my data
d1 = {'col1': [1,1], 'col2': [1,1], 'NAME': ["A","A"]}
d2 = {'col3': [1,1], 'col4': [1,1], 'NAME': ["A","A"]}
d3 = {'col1': [2,2], 'col2': [2,2], 'NAME': ["B","B"]}
d4 = {'col3': [2,2], 'col4': [2,2], 'NAME': ["B","B"]}
d5 = {'col5': []}
d6 = {'col5': []}
df_list = [d1, d2, d3, d4, d5, d6]
test_df = []
for item in df_list:
temp_df = pd.DataFrame(data=item)
test_df.append(temp_df)
merged = reduce(lambda left, right: pd.merge(left, right, on="NAME", how='outer'),test_df)
The above code also give me an error: KeyError: 'NAME'
I expect my merged result to be
col1 col2 col3 col4 NAME
0 1 1 1 1 A
1 1 1 1 1 A
0 2 2 2 2 B
1 2 2 2 2 B
CodePudding user response:
Assuming unique NAME values per dataframe, I would use pandas.concat
here:
(pd
.concat(test_df).rename_axis('index')
.groupby(['NAME', 'index']).first()
.reset_index('NAME')
)
output:
NAME col1 col2 col3 col4
index
0 A 1.0 1.0 1.0 1.0
1 A 1.0 1.0 1.0 1.0
0 B 2.0 2.0 2.0 2.0
1 B 2.0 2.0 2.0 2.0
CodePudding user response:
d1 = {'col1': [1,1], 'col2': [1,1], 'NAME': ["A","A"]}
d2 = {'col3': [1,1], 'col4': [1,1], 'NAME': ["A","A"]}
d3 = {'col1': [2,2], 'col2': [2,2], 'NAME': ["B","B"]}
d4 = {'col3': [2,2], 'col4': [2,2], 'NAME': ["B","B"]}
d5 = {'col5': []}
d6 = {'col5': []}
df_list = [d1, d2, d3, d4]
test_df = []
for item in df_list:
temp_df = pd.DataFrame(data=item)
test_df.append(temp_df)
merged = reduce(lambda left, right: pd.merge(left, right, on="NAME", how='outer'),test_df)
merged["col5"] = None
col1_x | col2_x | NAME | col3_x | col4_x | col1_y | col2_y | col3_y | col4_y | col5 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | A | 1 | 1 | nan | nan | nan | nan | |
1 | 1 | 1 | A | 1 | 1 | nan | nan | nan | nan | |
2 | 1 | 1 | A | 1 | 1 | nan | nan | nan | nan | |
3 | 1 | 1 | A | 1 | 1 | nan | nan | nan | nan | |
4 | nan | nan | B | nan | nan | 2 | 2 | 2 | 2 | |
5 | nan | nan | B | nan | nan | 2 | 2 | 2 | 2 | |
6 | nan | nan | B | nan | nan | 2 | 2 | 2 | 2 | |
7 | nan | nan | B | nan | nan | 2 | 2 | 2 | 2 |
If you just want to expand one column, then you can add a separate column without merge
KeyError: 'NAME'