Home > OS >  Sum of dataframes : treating NaN as 0 when summed with other values, but returning NaN where all sum
Sum of dataframes : treating NaN as 0 when summed with other values, but returning NaN where all sum

Time:04-12

I am trying to add some dataframes that contain NaN values. The data frames are index by time series, and in my case a NaN is meaningful, it means that a measurement wasn't done. So if all the data frames I'm adding have a NaN for a given timestamp, I need the result to have a NaN for this timestamp. But if one or more df have a value for the timestamp, I need to have the sum of theses values.

EDIT : Also, in my case, a 0 is different from an NaN, it means that there was a mesurement and it mesured 0 activity, different from a NaN meaning that there was no mesurement. So any solution using fillna(0) won't work.

I haven't found a proper way to do this yet. Here is an exemple of what I want to do :

import pandas as pd

df1 = pd.DataFrame({'value': [0, 1, 1, 1,  np.NaN, np.NaN, np.NaN]},
                   index=pd.date_range("01/01/2020 00:00", "01/01/2020 01:00", freq = '10T'))

df2 = pd.DataFrame({'value': [0, 5, 5, 5, 5, 5, np.NaN]},
                   index=pd.date_range("01/01/2020 00:00", "01/01/2020 01:00", freq = '10T'))

df1   df2
    

What i get :

df1   df2
                     value
2020-01-01 00:00:00    0.0
2020-01-01 00:10:00    6.0
2020-01-01 00:20:00    6.0
2020-01-01 00:30:00    6.0
2020-01-01 00:40:00    NaN
2020-01-01 00:50:00    NaN
2020-01-01 01:00:00    NaN

What I would want to have as a result :

                     value
2020-01-01 00:00:00    0.0
2020-01-01 00:10:00    6.0
2020-01-01 00:20:00    6.0
2020-01-01 00:30:00    6.0
2020-01-01 00:40:00    5.0
2020-01-01 00:50:00    5.0
2020-01-01 01:00:00    NaN

Does anybody know a clean way to do so ?

Thank you.

(I'm using Python 3.9.1 and pandas 1.2.4)

CodePudding user response:

You can use add with the fill_value=0 option. This will maintain the "all NaN" combinations as NaN:

df1.add(df2, fill_value=0)

output:

                     value
2020-01-01 00:00:00    0.0
2020-01-01 00:10:00    6.0
2020-01-01 00:20:00    6.0
2020-01-01 00:30:00    6.0
2020-01-01 00:40:00    5.0
2020-01-01 00:50:00    5.0
2020-01-01 01:00:00    NaN

CodePudding user response:

Does this solve your problem?

df = df1.fillna(0)   df2.fillna(0)
df[df['value'] == 0.0] = np.NaN
  • Related