My data looks like this:
Date AcuteLegs_1day
74 2021-12-09 450.0
75 2021-12-10 840.0
76 2021-12-13 900.0
77 2021-12-14 630.0
78 2021-12-15 400.0
79 2021-12-16 360.0
80 2021-12-19 150.0
81 2021-12-20 600.0
82 2021-12-21 600.0
I want a new column with the sum off this load off the last 7 days: Now I use:
df['AcuteLegs_7days']=df["AcuteLegs_1day"].rolling(7).sum()
df['AcuteLegs_7days']=df['AcuteLegs_7days'].shift(1)
This works nice. But it takes the sum from the last 7 rows. But because sometimes there is no data from a date the last seven rows is not the last 7 days.
What I want is if a date is missing, this will add 0 to the trainingload sum of 7 days.
CodePudding user response:
You can resample to create the missing day and fill with 0 before doing sum.
df = df.resample('D').sum().fillna(0)
CodePudding user response:
I am providing two approaches because your question seems little bit ambiguous:
First approach:
# Convert the Date column to a datetime type
df['Date'] = pd.to_datetime(df['Date'])
# Create a new column to store the sum of the last 7 days
df['AcuteLegs_7days'] = df.AcuteLegs_1day.rolling(7, min_periods=1).sum()
print(df)
Output:
Date AcuteLegs_1day AcuteLegs_7days
0 2021-12-09 450.0 450.0
1 2021-12-10 840.0 1290.0
2 2021-12-13 900.0 2190.0
3 2021-12-14 630.0 2820.0
4 2021-12-15 400.0 3220.0
5 2021-12-16 360.0 3580.0
6 2021-12-19 150.0 3730.0
7 2021-12-20 600.0 3880.0
8 2021-12-21 600.0 3640.0
Second approach:
# Convert the Date column to a datetime type
df['Date'] = pd.to_datetime(df['Date'])
# Set the Date column as the index of the dataframe
df.set_index('Date', inplace=True)
# Resample the data to daily frequency, filling missing values with 0
df = df.resample('D').asfreq().fillna(0)
df['AcuteLegs_7days'] = df['AcuteLegs_1day'].rolling(7, min_periods=1).sum()
print(df)
Output:
AcuteLegs_1day AcuteLegs_7days
Date
2021-12-09 450.0 450.0
2021-12-10 840.0 1290.0
2021-12-11 0.0 1290.0
2021-12-12 0.0 1290.0
2021-12-13 900.0 2190.0
2021-12-14 630.0 2820.0
2021-12-15 400.0 3220.0
2021-12-16 360.0 3130.0
2021-12-17 0.0 2290.0
2021-12-18 0.0 2290.0
2021-12-19 150.0 2440.0
2021-12-20 600.0 2140.0
2021-12-21 600.0 2110.0
CodePudding user response:
To compute the sum of the AcuteLegs_1day column over the last 7 days, including days with missing values and treating them as 0, you can use the resample method and the sum function.
df['AcuteLegs_7days'] = df.resample('7D', on='Date')['AcuteLegs_1day'].sum()