I have a table that looks like this:
Date | Unique id | Indicator |
---|---|---|
2018 | 1 | 1 |
2019 | 1 | 0 |
2020 | 1 | 0 |
2020 | 2 | 1 |
2018 | 2 | 0 |
2019 | 2 | 1 |
2020 | 2 | 1 |
2021 | 2 | 1 |
For each value in "Unique id" I want to check whether "Indicator" match a special sequence of values (say, [1,1,1])
I have tried using df.groupby(['Unique id'])['Indicator].isin([1,1,1]), but the groupby method has no isin method.
The desired output would be:
Unique id | Sequence match |
---|---|
1 | False |
2 | True |
CodePudding user response:
You can do rolling().apply()
like this:
def check_pattern(s, pattern):
return s.eq(pattern).all()
(df.groupby('Unique id', group_keys=False)['Indicator'].rolling(3)
.apply(check_pattern, kwargs={'pattern':(1,1,1)})
.groupby('Unique id').max().eq(1)
)
Output:
Unique id
1 False
2 True
Name: Indicator, dtype: bool
CodePudding user response:
I modified this solution for test per groups and then test if at least one value is 1
by GroupBy.any
:
a = [1,1,1]
N = len(pat)
df1 = (df.groupby(['Unique id'])['Indicator']
.rolling(window=N , min_periods=N)
.apply(lambda x: (x==np.asarray(a)).all())
.eq(1)
.groupby(level=0)
.any()
.reset_index(name='Sequence match'))
print (df1)
Unique id Sequence match
0 1 False
1 2 True
CodePudding user response:
Assuming the Indicator column only has 1 and 0 values, as shown in the example, and taking inspiration from Jezrael's and Quang Hoang's answer you can try with:
df.groupby("Unique ID").rolling(3).apply(lambda x: True if sum(x) == 3 else False).groupby('Unique ID').max()
Where the lambda will return True only if the rolling sum of the 3 periods is equal to three (therefore signaling there's 3 consecutive 1's
Indicator
Unique ID
1 0.0
2 1.0
That when expressed in Booleans, it's equal to:
Indicator
Unique ID
1 False
2 True
CodePudding user response:
You may also try this:
def is_sublist(y, x):
for i in range(len(x) - len(y) 1):
if x[i:i len(y)] == y:
return True
else:
return False
df.groupby('Unique id').Indicator.agg(pd.Series.tolist).apply(lambda x: is_sublist([1,1,1], x))