Home > Mobile >  Merging two pandas dataframes with date variable
Merging two pandas dataframes with date variable

Time:05-20

I want to merger two pandas dataframes based on common date variable. Below is my code

import pandas as pd
data = pd.DataFrame({'date' : pd.to_datetime(['2010-12-31', '2012-12-31']), 'val' : [1,2]})
datarange = pd.DataFrame(pd.period_range('2009-12-31', '2012-12-31', freq='A'), columns = ['date'])
pd.merge(datarange, data, how = 'left', on = 'date')

With this I get below result

   date  val
0  2009  NaN
1  2010  NaN
2  2011  NaN
3  2012  NaN

Could you please help how can I correctly merge these two dataframes?

CodePudding user response:

Use right_on for same anual periods like in datarange['date'] column:

df = pd.merge(datarange, 
              data, 
              how = 'left',
              left_on = 'date', 
              right_on=data['date'].dt.to_period('A'))
print (df)
   date date_x     date_y  val
0  2009   2009        NaT  NaN
1  2010   2010 2010-12-31  1.0
2  2011   2011        NaT  NaN
3  2012   2012 2012-12-31  2.0

Or create helper column:

df = pd.merge(datarange, 
              data.assign(datetimes=data['date'], date=data['date'].dt.to_period('A')), 
              how = 'left',
              on = 'date')
print (df)
   date  val  datetimes
0  2009  NaN        NaT
1  2010  1.0 2010-12-31
2  2011  NaN        NaT
3  2012  2.0 2012-12-31

CodePudding user response:

You need to merge on a common type.

For example you can set the year as merging key on each side:

pd.merge(datarange, data, how='left',
         left_on=datarange['date'].dt.year,
         right_on=data['date'].dt.year
        )

output:

   key_0 date_x     date_y  val
0   2009   2009        NaT  NaN
1   2010   2010 2010-12-31  1.0
2   2011   2011        NaT  NaN
3   2012   2012 2012-12-31  2.0
  • Related