I have a large dataframe df:
Col1 Col2 Col3 Val1 Val2
A1 B1 c1 0.2 -0.3
A1 B1 c2 -0.3 0.3
A1 B1 c3 0.5 0.2
A2 B2 c1 -0.3 0.1
A2 B2 c2 0.7 -0.3
A3 B3 c1 -0.3 0.3
A3 B3 c2 -0.2 0.3
A3 B3 c3 0.5 0.2
A3 B3 c4 0.8 0.7
Where there is alternating pattern in each group of Col1-Col2
for sign of Val1 and Val2
i.e. a pair where Val1
is positive and Val2
is negative and vice versa. I want to achieve the following:
Col1 Col2 Col3 Val1 Val2 Pattern
A1 B1 c1 0.2 -0.3 Y
A1 B1 c2 -0.3 0.3 Y
A1 B1 c3 0.5 0.2 Y
A2 B2 c1 -0.3 0.1 Y
A2 B2 c2 0.7 -0.3 Y
A3 B3 c1 -0.3 0.3 N
A3 B3 c2 -0.2 0.3 N
A3 B3 c3 -0.5 -0.2 N
A3 B3 c4 0.8 0.7 N
A1-B1
and A2-B2
has a pair of Val1
and Val2
with opposite sign but A3-B3 has none.
I am not sure how to proceed with the above considering dataframe is large.
Edit:
Reason A1-B1 is 'Y' is cause there is (0.2, -0.3) AND (-0.3, 0.3)
A2-B2 has (-0.3, 0.1) AND (0.7, -0.3)
A3-B3 doesn't have 2 such sets. It has only likes of (-0.3, 0.3) and nothing with Val1, Val2 of kind (positive, negative).
i.e. to classify as pattern it must have (positive, negative)
and (negative, positive)
CodePudding user response:
Use np.sign
checking with DataFrame.eq
and Groupby.transform
signs = np.sign(df[['Val1', 'Val2']])
m1 = signs.eq([1,-1]).all(axis=1)
m2 = signs.eq([-1,1]).all(axis=1)
df['Pattern'] = pd.concat([m1, m2], axis=1)\
.groupby([df['Col1'], df['Col2']])\
.transform('any').all(axis=1)\
.map({True:'Y', False: 'N'})
print(df)
Col1 Col2 Col3 Val1 Val2 Pattern
0 A1 B1 c1 0.2 -0.3 Y
1 A1 B1 c2 -0.3 0.3 Y
2 A1 B1 c3 0.5 0.2 Y
3 A2 B2 c1 -0.3 0.1 Y
4 A2 B2 c2 0.7 -0.3 Y
5 A3 B3 c1 -0.3 0.3 N
6 A3 B3 c2 -0.2 0.3 N
7 A3 B3 c3 0.5 0.2 N
8 A3 B3 c4 0.8 0.7 N
CodePudding user response:
You can groupby Col1
and Col2
column then use np.sign
to check the sign of value in Series. Then subtract the sign of Val1
and Val2
. If the sign of two numbers are opposite, the result would be 2
or -2
out = (df.groupby(['Col1', 'Col2'])
.apply(lambda g: 'Y'
if {2, -2}.issubset(set(np.sign(g['Val1']).sub(np.sign(g['Val2'])).unique()))
else 'N')
.to_frame('Pattern').reset_index())
print(out)
Col1 Col2 Pattern
0 A1 B1 Y
1 A2 B2 Y
2 A3 B3 N
At last, merge the result to original dataframe
df['Pattern'] = df.merge(out, on=['Col1', 'Col2'], how='left')['Pattern']
print(df)
Col1 Col2 Col3 Val1 Val2 Pattern
0 A1 B1 c1 0.2 -0.3 Y
1 A1 B1 c2 -0.3 0.3 Y
2 A1 B1 c3 0.5 0.2 Y
3 A2 B2 c1 -0.3 0.1 Y
4 A2 B2 c2 0.7 -0.3 Y
5 A3 B3 c1 -0.3 0.3 N
6 A3 B3 c2 -0.2 0.3 N
7 A3 B3 c3 0.5 0.2 N
8 A3 B3 c4 0.8 0.7 N
CodePudding user response:
You can compute boolean masks and combine them per group, then per row:
m1 = df['Val1'].lt(0) # Val1 negative
m2 = df['Val2'].lt(0) # Val2 negative
mask = (pd.concat([m1&~m2, # Val1 negative and Val2 positive
~m1&m2], # Val1 positive and Val2 negative
axis=1)
.groupby([df['Col1'], df['Col2']])
.transform('any') # is there at least one match per group?
.all(1) # were there both True for above?
)
df['Pattern'] = np.where(mask, 'Y', 'N')
output:
Col1 Col2 Col3 Val1 Val2 Pattern
0 A1 B1 c1 0.2 -0.3 Y
1 A1 B1 c2 -0.3 0.3 Y
2 A1 B1 c3 0.5 0.2 Y
3 A2 B2 c1 -0.3 0.1 Y
4 A2 B2 c2 0.7 -0.3 Y
5 A3 B3 c1 -0.3 0.3 N
6 A3 B3 c2 -0.2 0.3 N
7 A3 B3 c3 0.5 0.2 N
8 A3 B3 c4 0.8 0.7 N
intermediates:
Col1 Col2 Col3 Val1 Val2 Pattern m1&~m2 ~m1&m2 any(m1&~m2) any(~m1&m2) mask
0 A1 B1 c1 0.2 -0.3 Y False True True True True
1 A1 B1 c2 -0.3 0.3 Y True False True True True
2 A1 B1 c3 0.5 0.2 Y False False True True True
3 A2 B2 c1 -0.3 0.1 Y True False True True True
4 A2 B2 c2 0.7 -0.3 Y False True True True True
5 A3 B3 c1 -0.3 0.3 N True False True False False
6 A3 B3 c2 -0.2 0.3 N True False True False False
7 A3 B3 c3 0.5 0.2 N False False True False False
8 A3 B3 c4 0.8 0.7 N False False True False False