I have a dataset called df
that looks like this:
provider | fid | pid | datetime |
---|---|---|---|
CHE-223 | 2bfc9a62 | 2f43d557 | 2021-09-26T23:18:00 |
CHE-223 | fff669e9 | 295b82e2 | 2021-08-13T09:10:00 |
I wanted to create a new table called wave
that has categorical values for a range of date from datetime
. e.g. For the date from 16th of November 2019 until 28th of February 2020, it gives a value before covid and so on.
I used a loop function to achieve this and this is the code I used:
def wave(row):
if (row["datetime"] <= pd.Timestamp("2019-11-16")) & (row["datetime"] >= pd.Timestamp("2020-02-28")):
wave="before covid"
elif (row["datetime"] <= pd.Timestamp("2020-03-01")) & (row["datetime"] >= pd.Timestamp("2020-06-15")):
wave="1st wave"
elif (row["datetime"] <= pd.Timestamp("2020-06-16")) & (row["datetime"] >= pd.Timestamp("2020-09-30")):
wave="between waves"
elif (row["datetime"] <= pd.Timestamp("2020-10-01")) & (row["datetime"] >= pd.Timestamp("2021-01-15")):
wave="2nd wave"
df["wave"]=df.apply(lambda row:wave(row),axis=1)
But it gives me a column named wave
but with no values. How do I fix this and categorise the date?
CodePudding user response:
Your function needs to return something. Also your dates comparisons are inverted:
(row["datetime"] <= pd.Timestamp("2019-11-16")) & (row["datetime"] >= pd.Timestamp("2020-02-28"))
would match dates that are before the 16th of November 2019 and at the same time after the 28th of February 2020... which of course never happens.
Your function should look like:
def wave(row):
wave = ""
if (row["datetime"] >= pd.Timestamp("2019-11-16")) and (row["datetime"] <= pd.Timestamp("2020-02-28")):
wave="before covid"
elif (row["datetime"] >= pd.Timestamp("2020-03-01")) and (row["datetime"] <= pd.Timestamp("2020-06-15")):
wave="1st wave"
elif (row["datetime"] >= pd.Timestamp("2020-06-16")) and (row["datetime"] <= pd.Timestamp("2020-09-30")):
wave="between waves"
elif (row["datetime"] >= pd.Timestamp("2020-10-01")) and (row["datetime"] <= pd.Timestamp("2021-01-15")):
wave="2nd wave"
elif (row["datetime"] >= pd.Timestamp("2021-01-16")):
wave="after second wave"
return wave
Edit: also &
is a bit-wise operator. For logical expressions use and
.