i have edited this post with the specific case:
i have a list of dataframes like this (note that df1 and df2 have a row in common)
df1
index | Date | A |
---|---|---|
0 | 2010-06-19 | 4 |
1 | 2010-06-20 | 3 |
2 | 2010-06-21 | 2 |
3 | 2010-06-22 | 1 |
4 | 2012-07-19 | 5 |
df2
index | Date | B |
---|---|---|
0 | 2012-07-19 | 5 |
1 | 2012-07-20 | 6 |
df3
index | Date | C |
---|---|---|
0 | 2020-06-19 | 5 |
1 | 2020-06-20 | 2 |
2 | 2020-06-21 | 9 |
df_list = [df1, df2, df3]
I would like to merge all dataframes in a single dataframe, without losing rows and placing nan where there are no things to merge. The criteria would be merging them by the column 'Date' (the column should have all the dates of all the merged dataframes, ordered by date).
The resulting dataframe should look like this:
Resulting Dataframe:
index | Date | A | B | C |
---|---|---|---|---|
0 | 2010-06-19 | 4 | nan | nan |
1 | 2010-06-20 | 3 | nan | nan |
2 | 2010-06-21 | 2 | nan | nan |
3 | 2010-06-22 | 1 | nan | nan |
4 | 2012-07-19 | 5 | 5 | nan |
5 | 2012-07-20 | nan | 6 | nan |
6 | 2020-06-19 | nan | nan | 5 |
7 | 2020-06-20 | nan | nan | 2 |
8 | 2020-06-21 | nan | nan | 9 |
I tried something like this:
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['Date'], how='outer'), df_list)
BUT the resulting dataframe is not as expected (i miss some columns and is not ordered by date). I think i am missing something.
Thank you very much
CodePudding user response:
Use pandas.concat()
. It takes a list of dataframes, and appends common columns together, filling new columns with NaN as necessary:
new_df = pd.concat([df1, df2, df3])
Output:
>>> new_df
index Date A B C
0 0 2010-06-19 4.0 NaN NaN
1 1 2010-06-20 3.0 NaN NaN
2 2 2010-06-21 2.0 NaN NaN
3 3 2010-06-22 1.0 NaN NaN
0 0 2012-07-19 NaN 5.0 NaN
1 1 2012-07-20 NaN 6.0 NaN
0 0 2020-06-19 NaN NaN 5.0
1 1 2020-06-20 NaN NaN 2.0
2 2 2020-06-21 NaN NaN 9.0
CodePudding user response:
SOLVED: for overlapping datas, i had to add the option: Sort = TRUE in the lambda function. Seemed i was missing the order for big dataframes and i was only seeng the nan at end and start of frames. Thank you all ;-)
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['Date'], how='outer', sort=True), df_list)