Home > OS >  Find n or more Consecutive apprearance of value
Find n or more Consecutive apprearance of value

Time:07-07

Dataframe:

d = {'id': [1,1,1,1,1,1,2,2,2,2,3], 'log_date' : ['2021-01-05 07:23:00', '2021-01-05 07:24:00', '2021-01-05 07:25:00', '2021-01-05 07:26:00', '2021-01-05 07:27:00', '2021-01-05 07:28:00', '2021-01-06 07:23:00', '2021-01-06 07:24:00', '2021-01-06 07:25:00', '2021-01-06 07:23:00', '2021-01-05 07:20:00'], 'is_bool': [False,True,True,True,True,True,False,True,True,True,True ]}
df = pd.DataFrame(data=d)
df

Goal / Output: Filtered dataframe with unique ID and few other columns which have 4 or more consecutive value of True

| id | log_date   | max_count |
|----|------------|-----------|
| 1  | 2021-01-05 | 5         |

Attempt: Create a new column counter and add counter comparing lag bool value. Then filter rows with value ge 4.

Pseudocode function:

for i in range(len(df)):
  if i == 0:
    if df.loc[i,'is_bool']:
      df.loc[i,'counter'] = 1
    else:
      df.loc[i,'counter'] = 0
  elif df.loc[i,'id'] == df.loc[i-1,'id'] and df.loc[i,'is_bool']:
    df.loc[i,'counter'] = df.loc[i-1,'counter']   1
  else:
    df.loc[i,'counter'] = 0

How can I use apply method to call the function. Or any better way of achieving this.

CodePudding user response:

We can do rolling with groupby

n = 4
df['counter'] = (df.groupby('id')['is_bool'].rolling(n, min_periods=1).sum()
                .reset_index(level=0,drop=True))   
Out[163]: 
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
5    0.0
6    1.0
7    2.0
8    3.0
9    1.0
Name: is_bool, dtype: float64  

                                                              

Update

cond = df.groupby([df['id'],(~df['is_bool']).cumsum()])['is_bool'].cumsum()
out = df[cond == cond.max()].copy()
out['max_count'] = cond.max()
out
Out[220]: 
   id             log_date  is_bool  max_count
5   1  2021-01-05 07:28:00     True          5   

CodePudding user response:

Only keeping groups with 4 or more consecutive True

df.groupby('id').filter(lambda x: x['is_bool'].rolling(4).sum().max() >= 4)

Output:

   id  is_bool
0   1    False
1   1     True
2   1     True
3   1     True
4   1     True

Opposite:

df.groupby('id').filter(lambda x: x['is_bool'].rolling(4).sum().max() < 4)

Output:

   id  is_bool
5   2    False
6   2     True
7   2     True
8   2     True

CodePudding user response:

Converting the bool entries to integers and using the fact that at least 4 subsequent True entries (1 when converted to integers) have a rolling sum >=4 yields

list(df[df['is_bool'].astype(int).rolling(4).sum() >= 4.0]['id'])
> [1, 3]
  • Related