Home > other >  Merge three different dataframes in Python
Merge three different dataframes in Python

Time:07-06

I want to merge three data frames in Python, the code I have now provide me with some wrong outputs.

This is the first data frame

 df_1
   Year    Month        X_1    Y_1
 0  2021    January     $90     $100
 1  2021    February    NaN     $120
 2  2021    March       $100    $130
 3  2021    April       $110    $140
 4  2021    May         Nan     $150
 5  2019    June        $120    $160
   

This is the second data frame

  df_2
    Year    Month       X_2     Y_2
  0 2021    January     Nan     $120
  1 2021    February    NaN     $130
  2 2021    March       $80     $140
  3 2021    April       $90     $150
  4 2021    May         Nan     $150
  5 2021    June        $120    $170

This is the third data frame

   df_3
     Year    Month      X_3     Y_3
  0 2021    January     $110    $150
  1 2021    February    $140    $160
  2 2021    March       $97     $170
  3 2021    April       $90     $180
  4 2021    May         Nan     $190
  5 2021    June        $120    $200

The idea is to combine them into one data frame like this:

 df_combined
    Year    Month       X_1    Y_1    X_2    Y_2    X_3    Y_3
0   2019    January     $90     $100  NaN    $120   $110    $150 
1   2019    February    NaN     $120  NaN    $130   $140    $160
2   2019    March       $100    $130  $80    $140   $97     $170
3   2019    April       $110    $140  $90    $150   $90     $180
4   2019    May         Nan     $150  Nan    $150   Nan     $190
5   2019    June        $120    $160  $120   $170   $120    $200

The code I have for now does not give me the correct outcome, only df_3 has to the correct numbers.

 # compile the list of data frames you want to merge
 import functools as ft
 from functools import reduce
    
 data_frames = [df_1, df_2, df_3]
 df_merged = reduce(lambda  cross, right: pd.merge(cross,right,on=['Year'],
                                                           how='outer'),data_frames)
  #remove superfluous columns
  df_merged.drop(['Month_x', 'Month_y'], axis=1, inplace=True)

CodePudding user response:

You can try with

df_1.merge(df_2, how='left', on=['Year', 'Month']).merge(df_3, how='left', on=['Year', 'Month'])

CodePudding user response:

One option of probably many is to do

from functools import reduce
import pandas as pd

idx = ["Year", "Month"]
new_df = reduce(pd.DataFrame.join, (i.set_index(idx) for i in dataframes)).reset_index()

or

reduce(lambda x, y: pd.merge(x, y, how="outer", on=["Year", "Month"]), dataframes)
  • Related