I want to merger two pandas dataframes
based on common date
variable. Below is my code
import pandas as pd
data = pd.DataFrame({'date' : pd.to_datetime(['2010-12-31', '2012-12-31']), 'val' : [1,2]})
datarange = pd.DataFrame(pd.period_range('2009-12-31', '2012-12-31', freq='A'), columns = ['date'])
pd.merge(datarange, data, how = 'left', on = 'date')
With this I get below result
date val
0 2009 NaN
1 2010 NaN
2 2011 NaN
3 2012 NaN
Could you please help how can I correctly merge these two dataframes
?
CodePudding user response:
Use right_on
for same anual periods like in datarange['date']
column:
df = pd.merge(datarange,
data,
how = 'left',
left_on = 'date',
right_on=data['date'].dt.to_period('A'))
print (df)
date date_x date_y val
0 2009 2009 NaT NaN
1 2010 2010 2010-12-31 1.0
2 2011 2011 NaT NaN
3 2012 2012 2012-12-31 2.0
Or create helper column:
df = pd.merge(datarange,
data.assign(datetimes=data['date'], date=data['date'].dt.to_period('A')),
how = 'left',
on = 'date')
print (df)
date val datetimes
0 2009 NaN NaT
1 2010 1.0 2010-12-31
2 2011 NaN NaT
3 2012 2.0 2012-12-31
CodePudding user response:
You need to merge on a common type.
For example you can set the year as merging key on each side:
pd.merge(datarange, data, how='left',
left_on=datarange['date'].dt.year,
right_on=data['date'].dt.year
)
output:
key_0 date_x date_y val
0 2009 2009 NaT NaN
1 2010 2010 2010-12-31 1.0
2 2011 2011 NaT NaN
3 2012 2012 2012-12-31 2.0