Home > database >  Groupby remainder of day to show True if identical consecutive values
Groupby remainder of day to show True if identical consecutive values

Time:09-14

Given the following DataFrame. How do I add a new column showing True for the rest of the day when two consecutive "y" are seen in a single day in the val column (else False).

  • Each day resets the logic.

This is close but the True should be for each row in this day after condition is seen.

Code

df_so = pd.DataFrame(
    {
        "val": list("yynnnyyynn")
    },
    index=pd.date_range(start="1/1/2018", periods=10, freq="6h"),
)

                   val
2018-01-01 00:00:00 y
2018-01-01 06:00:00 y
2018-01-01 12:00:00 n
2018-01-01 18:00:00 n
2018-01-02 00:00:00 n
2018-01-02 06:00:00 y
2018-01-02 12:00:00 y
2018-01-02 18:00:00 y
2018-01-03 00:00:00 n
2018-01-03 06:00:00 n

Desired output

                    val  out
2018-01-01 00:00:00  y   False
2018-01-01 06:00:00  y   False
2018-01-01 12:00:00  n   True
2018-01-01 18:00:00  n   True
2018-01-02 00:00:00  n   False
2018-01-02 06:00:00  y   False
2018-01-02 12:00:00  y   False
2018-01-02 18:00:00  y   True
2018-01-03 00:00:00  n   False
2018-01-03 06:00:00  n   False

CodePudding user response:

You can use cummax to check if the condition holds at some point in the past:

target = 2
df_so['out'] = (df_so['val'].eq('y')
                    .groupby(df_so.index.normalize())
                    .transform(lambda x: x.rolling(target).sum().shift().eq(target).cummax())
               )

Output:

                    val    out
2018-01-01 00:00:00   y  False
2018-01-01 06:00:00   y  False
2018-01-01 12:00:00   n   True
2018-01-01 18:00:00   n   True
2018-01-02 00:00:00   n  False
2018-01-02 06:00:00   y  False
2018-01-02 12:00:00   y  False
2018-01-02 18:00:00   y   True
2018-01-03 00:00:00   n  False
2018-01-03 06:00:00   n  False
  • Related