I have 2 DataFrames: 'data_test' and 'data'. I need to add column 'final_output_ratio' to data_test, but only if value of column 'date' is the same for both (so I need to add only 3 values from data). DataFrames are:
data_test={'date':['2016-09-01 00:59:59','2016-09-01 01:59:59','2016-09-01 02:59:59'],
'stage_1_output':[0.88,0.91,0.82],
'stage_2_output':[0.91,0.95,0.87]}
data_test=pd.DataFrame(data=data_test)
data_test
date stage_1_output stage_2_output
0 2016-09-01 00:59:59 0.88 0.91
1 2016-09-01 01:59:59 0.91 0.95
2 2016-09-01 02:59:59 0.82 0.87
data={'date':['2016-09-01 00:59:59','2016-09-01 01:59:59','2016-09-01 02:59:59','2017-09-01 02:59:59','2017-09-01 03:14:59'],
'stage_1_output':[0.88,0.91,0.82,0.88,0.92],
'stage_2_output':[0.91,0.95,0.87,0.85,0.9],
'final_output_ratio':[0.22,0.17,0.14,0.18,0.24] }
data=pd.DataFrame(data=data)
date stage_1_output stage_2_output final_output_ratio
0 2016-09-01 00:59:59 0.88 0.91 0.22
1 2016-09-01 01:59:59 0.91 0.95 0.17
2 2016-09-01 02:59:59 0.82 0.87 0.14
3 2017-09-01 02:59:59 0.88 0.85 0.18
4 2017-09-01 03:14:59 0.92 0.90 0.24
I am trying this:
data_test['final_output_ratio']=data['final_output_ratio'].loc[data['date']==data_test['date']]
And get an error: ValueError: Can only compare identically-labeled Series objects
What can solve the problem?
CodePudding user response:
Use pd.merge
on date
with how='left'
parameter:
>>> pd.merge(data_test, data[['date', 'final_output_ratio']], how='left', on='date')
date stage_1_output stage_2_output final_output_ratio
0 2016-09-01 00:59:59 0.88 0.91 0.22
1 2016-09-01 01:59:59 0.91 0.95 0.17
2 2016-09-01 02:59:59 0.82 0.87 0.14