I am trying to add some dataframes that contain NaN values. The data frames are index by time series, and in my case a NaN is meaningful, it means that a measurement wasn't done. So if all the data frames I'm adding have a NaN for a given timestamp, I need the result to have a NaN for this timestamp. But if one or more df have a value for the timestamp, I need to have the sum of theses values.
EDIT : Also, in my case, a 0 is different from an NaN, it means that there was a mesurement and it mesured 0 activity, different from a NaN meaning that there was no mesurement. So any solution using fillna(0) won't work.
I haven't found a proper way to do this yet. Here is an exemple of what I want to do :
import pandas as pd
df1 = pd.DataFrame({'value': [0, 1, 1, 1, np.NaN, np.NaN, np.NaN]},
index=pd.date_range("01/01/2020 00:00", "01/01/2020 01:00", freq = '10T'))
df2 = pd.DataFrame({'value': [0, 5, 5, 5, 5, 5, np.NaN]},
index=pd.date_range("01/01/2020 00:00", "01/01/2020 01:00", freq = '10T'))
df1 df2
What i get :
df1 df2
value
2020-01-01 00:00:00 0.0
2020-01-01 00:10:00 6.0
2020-01-01 00:20:00 6.0
2020-01-01 00:30:00 6.0
2020-01-01 00:40:00 NaN
2020-01-01 00:50:00 NaN
2020-01-01 01:00:00 NaN
What I would want to have as a result :
value
2020-01-01 00:00:00 0.0
2020-01-01 00:10:00 6.0
2020-01-01 00:20:00 6.0
2020-01-01 00:30:00 6.0
2020-01-01 00:40:00 5.0
2020-01-01 00:50:00 5.0
2020-01-01 01:00:00 NaN
Does anybody know a clean way to do so ?
Thank you.
(I'm using Python 3.9.1 and pandas 1.2.4)
CodePudding user response:
You can use add
with the fill_value=0
option. This will maintain the "all NaN" combinations as NaN:
df1.add(df2, fill_value=0)
output:
value
2020-01-01 00:00:00 0.0
2020-01-01 00:10:00 6.0
2020-01-01 00:20:00 6.0
2020-01-01 00:30:00 6.0
2020-01-01 00:40:00 5.0
2020-01-01 00:50:00 5.0
2020-01-01 01:00:00 NaN
CodePudding user response:
Does this solve your problem?
df = df1.fillna(0) df2.fillna(0)
df[df['value'] == 0.0] = np.NaN