I have a dataframe with data from two forecast cycles, one starting at 00 UTC and going up to 168 hours forecast (Forecast (valid time column) and another cycle starting at 12 UTC and also going up to 168 hours forecast. Based on dataframe that I have below, I would like to create a column called cycle that corresponds to the Date-Time column of which forecast cycle the data refers. For example:
Date-Time Cycle
2020-07-16 00:00:00 00
2020-07-16 00:00:00 12
How can I do this?
My dataframe looks like this:
CodePudding user response:
IIUC, you can use pandas.Series.where
with pandas.Series.ffill
:
import numpy
df = pd.read_csv("df.csv", sep=";", index_col=0, usecols=[0,1,2])
df['Date-Time'] = pd.to_datetime(df['Date-Time'])
#is it the start of the cycle ?
m = df["Forecast (valid time)"].eq(0)
df["Cycle"] = df["Date-Time"].dt.hour.where(m).ffill()
Output :
print(df.groupby("Cycle").head(5))
Date-Time Forecast (valid time) Cycle
0 2020-07-16 00:00:00 0.0 0
1 2020-07-16 03:00:00 3.0 0
2 2020-07-16 06:00:00 6.0 0
3 2020-07-16 09:00:00 9.0 0
4 2020-07-16 12:00:00 12.0 0
57 2020-07-16 12:00:00 0.0 12
58 2020-07-16 15:00:00 3.0 12
59 2020-07-16 18:00:00 6.0 12
60 2020-07-16 21:00:00 9.0 12
61 2020-07-17 00:00:00 12.0 12