Home > front end >  Dataframe-Compare list of columns' values against one column's value with multiple criteri
Dataframe-Compare list of columns' values against one column's value with multiple criteri

Time:04-17

I have the below dataframe:

Com1       a       b       c        d       e      f       g      
0.83389 0.83585 0.8351  0.83493 0.83683 0.83697 0.83622 0.83573    
0.83375 0.83585 0.8351  0.83493 0.83683 0.83487 0.83357 0.83147    
0.83366 0.83585 0.8351  0.83493 0.83683 0.83567 0.83487 0.83357    
0.834   0.83585 0.8351  0.83493 0.83567 0.83147 0.82807 0.82257    

What I need is to compare the letter columns to the Com1 column. In plain language, the formula is:

ColList = [a,b,c,d,e,f,g]
if any number in ColList is > Com1 and < (Com1  0.001):
    Result = 0
else:
    Result = 1

The desired result is:

Com1       a       b       c        d       e      f       g         Result
0.83389 0.83585 0.8351  [0.83473]   0.83683 0.83697 0.83622 0.83573    0
0.83375 0.83585 0.8351  0.83493 0.83683 0.83487 0.83357 0.83147        1
0.83366 0.83585 0.8351  0.83493 0.83683 0.83567 0.83487 0.83357        1
0.834   0.83585 0.8351  [0.83493]   0.83567 0.83147 0.82807 0.82257    0

(I bracketed the numbers that are causing the "0" result)

I tried to use np.where, but it did not yield the correct results when I manually compared in excel. The formula I tried was:

ColList = [a,b,c,d,e,f,g]
df['Result'] = np.where( (df[ColList ].gt(df['Com1'],axis=0).any(axis=1)) & (df[ColList ].lt(df['Com1']   0.001,axis=0).any(axis=1)),0,1)

I also considered using the between() function, but from my research, since its a list of columns, I am not having a series and it cannot be used.

I appreciate the assistance.

CodePudding user response:

For vectorized operations, you could use gt and lt methods to get boolean frames and combine them using & operator and combine across rows using any:

cols = df.drop(columns='Com1')
out = (1-(cols.gt(df['Com1'], axis=0) & 
       cols.lt(df['Com1'] 0.001, axis=0))
       .any(axis=1).astype(int))

Output:

0    1
1    1
2    1
3    0
dtype: int64

Note that the output I get here is different from the one you get because the input given is different from the input used to derive the desired output.

CodePudding user response:

You can use apply with axis=1 to work on the rows with the help of a function:

def between(s: pd.Series):
    ColList = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
    return int(not any(s[ColList].between(s['Com1'], s['Com1'] 0.001)))


df['result'] = df.apply(between, axis=1)

If you want you could also have the column list and column to compare to as argument and use partial before applying it.

  • Related