Consider the following DataFrame and Series:
>>> base_data = pd.DataFrame({'out_temp': {Timestamp('2021-09-01 05:40:00 0000', tz='UTC'): 15.0, Timestamp('2021-09-01 05:50:00 0000', tz='UTC'): 10.0, Timestamp('2021-09-01 06:00:00 0000', tz='UTC'): 18.0, Timestamp('2021-09-01 06:10:00 0000', tz='UTC'): 16.0, Timestamp('2021-09-01 06:20:00 0000', tz='UTC'): 9.0, Timestamp('2021-09-01 06:30:00 0000', tz='UTC'): 8.0, Timestamp('2021-09-01 06:40:00 0000', tz='UTC'): 14.0, Timestamp('2021-09-01 06:50:00 0000', tz='UTC'): 10.0, Timestamp('2021-09-01 07:00:00 0000', tz='UTC'): 13.0, Timestamp('2021-09-01 07:10:00 0000', tz='UTC'): 17.0}, 'out_humidity': {Timestamp('2021-09-01 05:40:00 0000', tz='UTC'): 95.0, Timestamp('2021-09-01 05:50:00 0000', tz='UTC'): 95.0, Timestamp('2021-09-01 06:00:00 0000', tz='UTC'): 93.0, Timestamp('2021-09-01 06:10:00 0000', tz='UTC'): 92.0, Timestamp('2021-09-01 06:20:00 0000', tz='UTC'): 91.0, Timestamp('2021-09-01 06:30:00 0000', tz='UTC'): 92.0, Timestamp('2021-09-01 06:40:00 0000', tz='UTC'): 93.0, Timestamp('2021-09-01 06:50:00 0000', tz='UTC'): 93.0, Timestamp('2021-09-01 07:00:00 0000', tz='UTC'): 89.0, Timestamp('2021-09-01 07:10:00 0000', tz='UTC'): 90.0}})
>>> base_data
out_temp out_humidity
datetime
2021-09-01 05:40:00 00:00 15.0 95.0
2021-09-01 05:50:00 00:00 10.0 95.0
2021-09-01 06:00:00 00:00 18.0 93.0
2021-09-01 06:10:00 00:00 16.0 92.0
2021-09-01 06:20:00 00:00 9.0 91.0
2021-09-01 06:30:00 00:00 8.0 92.0
2021-09-01 06:40:00 00:00 14.0 93.0
2021-09-01 06:50:00 00:00 10.0 93.0
2021-09-01 07:00:00 00:00 13.0 89.0
2021-09-01 07:10:00 00:00 17.0 90.0
>>> updated = pd.Series({Timestamp('2021-09-01 06:30:00 0000', tz='UTC'): 8.0, Timestamp('2021-09-01 06:40:00 0000', tz='UTC'): 14.0, Timestamp('2021-09-01 06:50:00 0000', tz='UTC'): nan, Timestamp('2021-09-01 07:00:00 0000', tz='UTC'): 13.0, Timestamp('2021-09-01 07:10:00 0000', tz='UTC'): nan})
>>> updated
datetime
2021-09-01 06:30:00 00:00 8.0
2021-09-01 06:40:00 00:00 14.0
2021-09-01 06:50:00 00:00 NaN
2021-09-01 07:00:00 00:00 13.0
2021-09-01 07:10:00 00:00 NaN
Name: out_temp, dtype: float64
The DataFrame contains column, out_temp, which is the same as the name of the Series. The index is the same, however, the Series is missing some index values. I want to combine these two together, replacing all values in the DataFrame by the values that are different in the Series. So in this case, I want the following result:
out_temp out_humidity
datetime
2021-09-01 05:40:00 00:00 15.0 95.0
2021-09-01 05:50:00 00:00 10.0 95.0
2021-09-01 06:00:00 00:00 18.0 93.0
2021-09-01 06:10:00 00:00 16.0 92.0
2021-09-01 06:20:00 00:00 9.0 91.0
2021-09-01 06:30:00 00:00 8.0 92.0
2021-09-01 06:40:00 00:00 14.0 93.0
2021-09-01 06:50:00 00:00 NaN 93.0
2021-09-01 07:00:00 00:00 13.0 89.0
2021-09-01 07:10:00 00:00 NaN 90.0
The temperature got replaced by the NaN values in the Series.
CodePudding user response:
Because you have nan
, you have to subsetting your original dataframe to update values:
base_data.loc[updated.index, 'out_temp'] = updated
print(base_data)
# Output
out_temp out_humidity
2021-09-01 05:40:00 00:00 15.0 95.0
2021-09-01 05:50:00 00:00 10.0 95.0
2021-09-01 06:00:00 00:00 18.0 93.0
2021-09-01 06:10:00 00:00 16.0 92.0
2021-09-01 06:20:00 00:00 9.0 91.0
2021-09-01 06:30:00 00:00 8.0 92.0
2021-09-01 06:40:00 00:00 14.0 93.0
2021-09-01 06:50:00 00:00 NaN 93.0
2021-09-01 07:00:00 00:00 13.0 89.0
2021-09-01 07:10:00 00:00 NaN 90.0