I have a dataframe with a grouped date and a count, if there is any gap in this time series I have to fill them with excess of previous stacks, and if no gaps extend the series until all counts = 1. These examples happen all same month
NOTE: day_date is a timestamp with daily frequency where missing values are 0, did integer for simplicity in example
An example with missing gaps but no previous stacks:
| day_date | stack |
| -------- | ----- |
| 1 | 0 |
| 2 | 2 |
Produces
| day_date | stack |
| -------- | ----- |
| 1 | 0 |
| 2 | 1 | #
| 3 | 1 | # The entire period flattents to a day frequency with value = 1
An example of days being over stacked and filling gaps:
| day_date | stack |
| -------- | ----- |
| 1 | 0 |
| 2 | 2 | #this row wont be able to fill until the 6th
| 6 | 3 | #this row and below will craete overlap
| 8 | 2 |
| 15 | 1 | # there is a big gap here that will get filled as much as possible from previous overlap
Produces:
| day_date | stack |
| -------- | ----- |
| 1 | 0 |
| 2 | 1 |
| 3 | 1 |
| 4 | 0 | # the previous staack coverd only until the 3rd.
| 5 | 0 |
| 6 | 1 |
| 7 | 1 |
| 8 | 1 | #Here is an overal of last stack from 6 and 2 days from 8, this results on the two days from 8 moving forward to fill gaps as the day is covered from past stack.
| 9 | 1 | # there is a big gap here that will get filled as much as possible from previous overlap from the 8th, which is 2 days that fill 9th and 10th.
| 10 | 1 |
| 11 | 0 |
| 12 | 0 |
| 13 | 0 |
| 14 | 0 |
| 15 | 1 | #last stack.
Note that the reason 9th and 10th have a 1 is because the excess from the date 8 which was covered since the big refill that happened the 6th and covered from 6th to 8th.
CodePudding user response:
EDIT: using timestamps
Maybe a more readable solution (for beginners) using for loops and a bunch of if statements:
import pandas as pd
lst = [[pd.Timestamp(year=2017, month=1, day=1), 0],
[pd.Timestamp(year=2017, month=1, day=2), 2],
[pd.Timestamp(year=2017, month=1, day=10), 3],
[pd.Timestamp(year=2017, month=2, day=1), 2],
[pd.Timestamp(year=2017, month=2, day=3), 2]]
df = pd.DataFrame(lst, columns=['day_date', 'stack'])
n_days = (df.day_date.max() - df.day_date.min()).days 1
stack = 0
for index in range(n_days):
stack = df.loc[index, 'stack']
# insert new day
if index 1 < len(df): # if you are not at the end of the dataframe
next_day = df.loc[index 1].day_date # compute the next day in dataframe
this_day = df.loc[index].day_date # compute this day
if df.loc[index, 'stack'] >= 1:
df.loc[index, 'stack'] = 1
stack -= 1
if this_day pd.DateOffset(1) != next_day: # if there is a gap in days
for new_day in range(1, (next_day - this_day).days):
if stack > 0:
df.loc[len(df)] = [this_day pd.DateOffset(new_day), 1]
stack -= 1
else:
df.loc[len(df)] = [this_day pd.DateOffset(new_day), 0]
df = df.sort_values('day_date').reset_index(drop=True)
else:
if df.loc[index, 'stack'] >= 1:
df.loc[index, 'stack'] = 1
stack -= 1
while stack >= 1:
this_day = df.loc[len(df)-1].day_date
df.loc[len(df)] = [this_day pd.DateOffset(1), 1]
stack -= 1
CodePudding user response:
This is not such an easy task (if needed to perform in a vectorial way).
You can calculate first the remainder days to carry them to the next date, then use reindexing to duplicate/fill the rows:
remainder = (df['stack'].add(df['day_date'].diff(-1))
.fillna(0, downcast='infer').clip(lower=0)
)
df2 = (df
# shift extra "stack" to next stack
.assign(stack=df['stack'].sub(remainder).add(remainder.shift(fill_value=0)))
# repeat rows using "stack" value with a minimum of 1
.loc[lambda d: d.index.repeat(d['stack'].clip(lower=1))]
# make stack>1 equal to 1
# and increment the days per group
.assign(stack=lambda d: d['stack'].clip(upper=1),
day_date=lambda d: d['day_date'].add(
(m:=d['day_date'].duplicated())
.astype(int)
.groupby((~m).cumsum())
.cumsum()
)
)
# fill missing days (all remaining lines)
.set_index('day_date')
.reindex(range(df['day_date'].min(), df['day_date'].max() 1))
.fillna(0, downcast='infer')
.reset_index()
)
output:
day_date stack
0 1 0
1 2 1
2 3 1
3 4 0
4 5 0
5 6 1
6 7 1
7 8 1
8 9 1
9 10 1
10 11 0
11 12 0
12 13 0
13 14 0
14 15 1