Home > database >  Pandas groupby spotting a pattern between columns
Pandas groupby spotting a pattern between columns

Time:05-19

I have a large dataframe df:

Col1    Col2    Col3    Val1    Val2
A1      B1      c1      0.2    -0.3
A1      B1      c2     -0.3     0.3
A1      B1      c3      0.5     0.2
A2      B2      c1     -0.3     0.1
A2      B2      c2      0.7    -0.3
A3      B3      c1     -0.3     0.3
A3      B3      c2     -0.2     0.3
A3      B3      c3      0.5     0.2
A3      B3      c4      0.8     0.7

Where there is alternating pattern in each group of Col1-Col2 for sign of Val1 and Val2 i.e. a pair where Val1 is positive and Val2 is negative and vice versa. I want to achieve the following:

Col1    Col2    Col3    Val1    Val2  Pattern
A1      B1      c1      0.2    -0.3   Y
A1      B1      c2     -0.3     0.3   Y
A1      B1      c3      0.5     0.2   Y
A2      B2      c1     -0.3     0.1   Y
A2      B2      c2      0.7    -0.3   Y
A3      B3      c1     -0.3     0.3   N
A3      B3      c2     -0.2     0.3   N
A3      B3      c3     -0.5    -0.2   N
A3      B3      c4      0.8     0.7   N

A1-B1 and A2-B2 has a pair of Val1 and Val2 with opposite sign but A3-B3 has none.

I am not sure how to proceed with the above considering dataframe is large.

Edit:

Reason A1-B1 is 'Y' is cause there is (0.2, -0.3) AND (-0.3, 0.3)

A2-B2 has (-0.3, 0.1) AND (0.7, -0.3)

A3-B3 doesn't have 2 such sets. It has only likes of (-0.3, 0.3) and nothing with Val1, Val2 of kind (positive, negative).

i.e. to classify as pattern it must have (positive, negative) and (negative, positive)

CodePudding user response:

Use np.sign checking with DataFrame.eq and Groupby.transform

signs = np.sign(df[['Val1', 'Val2']])
m1 = signs.eq([1,-1]).all(axis=1)
m2 = signs.eq([-1,1]).all(axis=1)
df['Pattern'] = pd.concat([m1, m2], axis=1)\
                  .groupby([df['Col1'], df['Col2']])\
                  .transform('any').all(axis=1)\
                  .map({True:'Y', False: 'N'})
print(df)
  Col1 Col2 Col3  Val1  Val2 Pattern
0   A1   B1   c1   0.2  -0.3       Y
1   A1   B1   c2  -0.3   0.3       Y
2   A1   B1   c3   0.5   0.2       Y
3   A2   B2   c1  -0.3   0.1       Y
4   A2   B2   c2   0.7  -0.3       Y
5   A3   B3   c1  -0.3   0.3       N
6   A3   B3   c2  -0.2   0.3       N
7   A3   B3   c3   0.5   0.2       N
8   A3   B3   c4   0.8   0.7       N

CodePudding user response:

You can groupby Col1 and Col2 column then use np.sign to check the sign of value in Series. Then subtract the sign of Val1 and Val2. If the sign of two numbers are opposite, the result would be 2 or -2

out = (df.groupby(['Col1', 'Col2'])
       .apply(lambda g: 'Y'
              if {2, -2}.issubset(set(np.sign(g['Val1']).sub(np.sign(g['Val2'])).unique()))
              else 'N')
       .to_frame('Pattern').reset_index())
print(out)

  Col1 Col2 Pattern
0   A1   B1       Y
1   A2   B2       Y
2   A3   B3       N

At last, merge the result to original dataframe

df['Pattern'] = df.merge(out, on=['Col1', 'Col2'], how='left')['Pattern']
print(df)

  Col1 Col2 Col3  Val1  Val2 Pattern
0   A1   B1   c1   0.2  -0.3       Y
1   A1   B1   c2  -0.3   0.3       Y
2   A1   B1   c3   0.5   0.2       Y
3   A2   B2   c1  -0.3   0.1       Y
4   A2   B2   c2   0.7  -0.3       Y
5   A3   B3   c1  -0.3   0.3       N
6   A3   B3   c2  -0.2   0.3       N
7   A3   B3   c3   0.5   0.2       N
8   A3   B3   c4   0.8   0.7       N

CodePudding user response:

You can compute boolean masks and combine them per group, then per row:

m1 = df['Val1'].lt(0)  # Val1 negative
m2 = df['Val2'].lt(0)  # Val2 negative

mask = (pd.concat([m1&~m2,  # Val1 negative and Val2 positive
                   ~m1&m2], # Val1 positive and Val2 negative
                  axis=1)
          .groupby([df['Col1'], df['Col2']])
          .transform('any')  # is there at least one match per group?
          .all(1)            # were there both True for above?
        )

df['Pattern'] = np.where(mask, 'Y', 'N')

output:

  Col1 Col2 Col3  Val1  Val2 Pattern
0   A1   B1   c1   0.2  -0.3       Y
1   A1   B1   c2  -0.3   0.3       Y
2   A1   B1   c3   0.5   0.2       Y
3   A2   B2   c1  -0.3   0.1       Y
4   A2   B2   c2   0.7  -0.3       Y
5   A3   B3   c1  -0.3   0.3       N
6   A3   B3   c2  -0.2   0.3       N
7   A3   B3   c3   0.5   0.2       N
8   A3   B3   c4   0.8   0.7       N

intermediates:

  Col1 Col2 Col3  Val1  Val2 Pattern  m1&~m2  ~m1&m2  any(m1&~m2)  any(~m1&m2)   mask
0   A1   B1   c1   0.2  -0.3       Y   False    True         True         True   True
1   A1   B1   c2  -0.3   0.3       Y    True   False         True         True   True
2   A1   B1   c3   0.5   0.2       Y   False   False         True         True   True
3   A2   B2   c1  -0.3   0.1       Y    True   False         True         True   True
4   A2   B2   c2   0.7  -0.3       Y   False    True         True         True   True
5   A3   B3   c1  -0.3   0.3       N    True   False         True        False  False
6   A3   B3   c2  -0.2   0.3       N    True   False         True        False  False
7   A3   B3   c3   0.5   0.2       N   False   False         True        False  False
8   A3   B3   c4   0.8   0.7       N   False   False         True        False  False
  • Related