I have the following dataframe:
ID A1 A2 B1 B2 C1 C2
AA 1 3 7 4
BB
CC
DD 2 5 A 1
I want to know if any value is used in either column C1 or C2 if the adjacent rows have empty values. In the example above, the output would be true only in row DD since a value is found in C2 and the B1 and B2 are empty. I want to examine each row and know if this instance ever occurs.
I've tried several different ways to get this done. I mostly use the .all()
and .any()
methods, but I haven't had any success. These methods seem to be absolute, but I'm looking to examine by each row instead. Any guidance is greatly appreciated.
CodePudding user response:
Create seperate boolean conditions, then filter your dataframe the target ID
## replace whitespace with actually na values.
import numpy as np
df = df.replace('\s',np.nan)
s = (~df[['C1','C2']].isnull()).sum(axis=1).gt(0)
s1 = df.filter(regex='A|B').isnull().any(axis=1)
print(df.loc[s & s1,'ID'])
3 DD
Name: ID, dtype: object
Boolean filtering
print(df)
ID A1 A2 B1 B2 C1 C2
0 AA 1.0 3.0 7.0 4.0 NaN NaN
1 BB NaN NaN NaN NaN NaN NaN
2 CC NaN NaN NaN NaN NaN NaN
3 DD 2.0 5.0 NaN NaN A 1.0
print((~df[['C1','C2']].isnull()))
C1 C2
0 False False
1 False False
2 False False
3 True True
^ these can be turned into sums to measure conditions.