I have a dataframe with values every 3 hours, which I upsample to 1 hour. The new bins are kept empty with NaNs. I want to fill those NaNs with values which when summed are equal to the value of the bin that was not upsampled, and "de-sum" the value of the bin not-upsampled.
For example: I have 3 bins. The 3rd bin has a value of 3, bins 1 & 2 have NaNs. I want to fill bins 1,2 & 3 with 1 for each of them. In the end, if I had a cumulative sum every 3 bins, the result would be equal to the value of my bin before the upsampling.
I wrote an example of what I mean (sorry I have trouble explaining clearly). Is there a better way to do that ?
import numpy as np
import pandas as pd
from datetime import *
# Create df with a datetime index every 3 hours
rng = pd.date_range('2000-01-01', periods=365*(24/3), freq='3H')
df = pd.DataFrame({'Val': np.random.randn(len(rng)) }, index = rng)
# Upsample to 1H but keep the new bins empty
df = df.resample('1H').asfreq()
# Create a copy of df to verify that the sum went well
df_summed_every_3_bins = df.copy()
# Create a counter to the next bin holding a value
to_full_bin = 2
# We de-sum the first value
df.Val.values[0] = df.Val.values[0]/3
for i in range(1,len(df)):
# Take the value from a bin, divide it by 3 and insert it in the dataframe
df.Val.values[i] = df.Val.values[i to_full_bin]/3
# We move forward in df, meaning that the bin with a value is approaching. So we reduce the counter by 1
to_full_bin = to_full_bin-1
# When the variable is equal to -1, it means we need to reinitialized our counter
if to_full_bin == -1:
to_full_bin = 2
CodePudding user response:
Resample
the dataframe, then backfill
and divide by 3
df.resample('1H').bfill().div(3)
Val
2000-01-01 00:00:00 -0.747733
2000-01-01 01:00:00 -0.057699
2000-01-01 02:00:00 -0.057699
2000-01-01 03:00:00 -0.057699
2000-01-01 04:00:00 -0.409512
2000-01-01 05:00:00 -0.409512
2000-01-01 06:00:00 -0.409512
2000-01-01 07:00:00 -0.108856
2000-01-01 08:00:00 -0.108856
2000-01-01 09:00:00 -0.108856
...