Home > Net >  Categorise the datetime in to a new column based on the range
Categorise the datetime in to a new column based on the range

Time:04-28

I have a dataset called df that looks like this:

provider fid pid datetime
CHE-223 2bfc9a62 2f43d557 2021-09-26T23:18:00
CHE-223 fff669e9 295b82e2 2021-08-13T09:10:00

I wanted to create a new table called wave that has categorical values for a range of date from datetime. e.g. For the date from 16th of November 2019 until 28th of February 2020, it gives a value before covid and so on.

I used a loop function to achieve this and this is the code I used:

def wave(row):
    if (row["datetime"] <= pd.Timestamp("2019-11-16")) & (row["datetime"] >= pd.Timestamp("2020-02-28")):
        wave="before covid"
    elif (row["datetime"] <= pd.Timestamp("2020-03-01")) & (row["datetime"] >= pd.Timestamp("2020-06-15")):
        wave="1st wave"
    elif (row["datetime"] <= pd.Timestamp("2020-06-16"))  & (row["datetime"] >= pd.Timestamp("2020-09-30")):
        wave="between waves"
    elif (row["datetime"] <= pd.Timestamp("2020-10-01")) & (row["datetime"] >= pd.Timestamp("2021-01-15")):
        wave="2nd wave"

df["wave"]=df.apply(lambda row:wave(row),axis=1)

But it gives me a column named wave but with no values. How do I fix this and categorise the date?

CodePudding user response:

Your function needs to return something. Also your dates comparisons are inverted:

(row["datetime"] <= pd.Timestamp("2019-11-16")) & (row["datetime"] >= pd.Timestamp("2020-02-28"))

would match dates that are before the 16th of November 2019 and at the same time after the 28th of February 2020... which of course never happens.

Your function should look like:

def wave(row):
    wave = ""
    if (row["datetime"] >= pd.Timestamp("2019-11-16")) and (row["datetime"] <= pd.Timestamp("2020-02-28")):
        wave="before covid"
    elif (row["datetime"] >= pd.Timestamp("2020-03-01")) and (row["datetime"] <= pd.Timestamp("2020-06-15")):
        wave="1st wave"
    elif (row["datetime"] >= pd.Timestamp("2020-06-16"))  and (row["datetime"] <= pd.Timestamp("2020-09-30")):
        wave="between waves"
    elif (row["datetime"] >= pd.Timestamp("2020-10-01")) and (row["datetime"] <= pd.Timestamp("2021-01-15")):
        wave="2nd wave"
    elif (row["datetime"] >= pd.Timestamp("2021-01-16")):
        wave="after second wave"
    return wave

Edit: also & is a bit-wise operator. For logical expressions use and.

  • Related