Home > Blockchain >  Pandas: check a sequence in one column for each unique value in another column
Pandas: check a sequence in one column for each unique value in another column

Time:08-13

I have a table that looks like this:

Date Unique id Indicator
2018 1 1
2019 1 0
2020 1 0
2020 2 1
2018 2 0
2019 2 1
2020 2 1
2021 2 1

For each value in "Unique id" I want to check whether "Indicator" match a special sequence of values (say, [1,1,1])

I have tried using df.groupby(['Unique id'])['Indicator].isin([1,1,1]), but the groupby method has no isin method.

The desired output would be:

Unique id Sequence match
1 False
2 True

CodePudding user response:

You can do rolling().apply() like this:

def check_pattern(s, pattern):
    return s.eq(pattern).all()

(df.groupby('Unique id', group_keys=False)['Indicator'].rolling(3)
   .apply(check_pattern, kwargs={'pattern':(1,1,1)})
   .groupby('Unique id').max().eq(1)
)

Output:

Unique id
1    False
2     True
Name: Indicator, dtype: bool

CodePudding user response:

I modified this solution for test per groups and then test if at least one value is 1 by GroupBy.any:

a = [1,1,1]

N = len(pat)
df1 = (df.groupby(['Unique id'])['Indicator']
               .rolling(window=N , min_periods=N)
               .apply(lambda x: (x==np.asarray(a)).all())
               .eq(1)
               .groupby(level=0)
               .any()
               .reset_index(name='Sequence match'))

print (df1)
   Unique id  Sequence match
0          1           False
1          2            True

CodePudding user response:

Assuming the Indicator column only has 1 and 0 values, as shown in the example, and taking inspiration from Jezrael's and Quang Hoang's answer you can try with:

df.groupby("Unique ID").rolling(3).apply(lambda x: True if sum(x) == 3 else False).groupby('Unique ID').max()

Where the lambda will return True only if the rolling sum of the 3 periods is equal to three (therefore signaling there's 3 consecutive 1's

           Indicator
Unique ID           
1                0.0
2                1.0

That when expressed in Booleans, it's equal to:

           Indicator
Unique ID           
1              False
2               True

CodePudding user response:

You may also try this:

def is_sublist(y, x):
    for i in range(len(x) - len(y)   1):
         if x[i:i len(y)] == y:
              return True
    else:
         return False

df.groupby('Unique id').Indicator.agg(pd.Series.tolist).apply(lambda x: is_sublist([1,1,1], x))
  • Related