Home > Enterprise >  Sum dataframe rows for missing timestamps
Sum dataframe rows for missing timestamps

Time:12-02

I would like to know if there is a way to calculate a sum over different dataframe rows when they have different rows' number due to missing timestamps. I have the following example:

import pandas as pd

data1 = {'produced':  [19.7, 39.1, 86.4, 167.1]}
data2 = {'produced':  [22.4, 95, 144.3, 300.2]}
data3 = {'produced':  [15.1, 44.1, 80, 302.5]}

df1 = pd.DataFrame(data1, index = ['01/06/2021 09:35', '01/06/2021 09:40', '01/06/2021 09:45', '01/06/2021 09:50'])
df2 = pd.DataFrame(data2, index = ['01/06/2021 09:35', '01/06/2021 09:45', '01/06/2021 09:50', '01/06/2021 09:55'])
df3 = pd.DataFrame(data3, index = ['01/06/2021 09:35', '01/06/2021 09:40', '01/06/2021 09:45', '01/06/2021 09:55'])

As you can see, I have some dfs with distinct row numbers because there are a few "blank" values. What I would like to do is to calculate the sum, over rows, for each 5min. interval and simply skip the blanks. If I sum it over the rows as it is, I would be incorrectly summing different timestamps.

Also, wouldbe possible to store the missing timestamps in an array or something similar? Thank you in advance!

CodePudding user response:

concat then sum:

pd.concat([df1,df2,df3]).sum(level=0)
# pd.concat([df1,df2,df3]).groupby(level=0).sum()

Output:

                  produced
01/06/2021 09:35      57.2
01/06/2021 09:40      83.2
01/06/2021 09:45     261.4
01/06/2021 09:50     311.4
01/06/2021 09:55     602.7

CodePudding user response:

You can also merge the dataframes on indices and sum across rows. That way, you can see the dataframes that had a missing values.

df_total = df1.add_suffix("_df1")
for i, df in enumerate([df2, df3], start=2):
    df = df.add_suffix("_df{}".format(i))
    df_total = df_total.merge(df, right_index=True, left_index=True, how="outer")
df_total["sum"] = df_total.sum(axis=1)
print(df_total)

Output:

                  produced_df1  produced_df2  produced_df3    sum
01/06/2021 09:35          19.7          22.4          15.1   57.2
01/06/2021 09:40          39.1           NaN          44.1   83.2
01/06/2021 09:45          86.4          95.0          80.0  261.4
01/06/2021 09:50         167.1         144.3           NaN  311.4
01/06/2021 09:55           NaN         300.2         302.5  602.7
  • Related