Home > Mobile >  (Pandas) Comparing the values of N dataframe columns with each other and check whether they are in a
(Pandas) Comparing the values of N dataframe columns with each other and check whether they are in a

Time:01-07

I am looking for a way to compare multiple columns with each other when the number of columns is not explicitly known.

Specifically, given N columns, I want to create an additional column called 'result' where the value of each row will be:

  • 1 if col(0) > col(1) > col(2) > ... > col(N-1) > col(N) for that row
  • -1 if the opposite is true (col(0) < col(1) < ... < col(N-1) < col(N)
  • 0 if none of the above is True

For example, using the following dataframe

df = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,1],
                   'col2': [4,3,2,1,0,-1,-2,-3,1],
                   'col3': [8,6,4,2,0,-2,-4,-6,1]})

   col1  col2  col3
0     1     4     8
1     2     3     6
2     3     2     4
3     4     1     2
4     5     0     0
5     6    -1    -2
6     7    -2    -4
7     8    -3    -6
8     1     1     1

I should get the following result column

   col1  col2  col3  result
0     1     4     8      -1
1     2     3     6      -1
2     3     2     4       0
3     4     1     2       0
4     5     0     0       0
5     6    -1    -2       1
6     7    -2    -4       1
7     8    -3    -6       1
8     1     1     1       0

with 3 columns I could simply do

condition1 = (df['col1'] > df['col2']) & (df['col2'] > df['col3'])
condition2 = (df['col1'] < df['col2']) & (df['col2'] < df['col3'])

df['result'] = np.select([condition1,condition2], [1,-1], 0)

the problem is that this quickly becomes awfully inefficient as the number of columns increases.

Something I have tried is converting the columns to lists, get the first element of each list, check if they are descending or ascending (using a function that i found searching for this answer) and then creating a 'results list' from that.

#Checking whether the list given is in Ascending or Descending order
def isOrdered(some_list):
    isAscending = True
    isDescending = True
    for i in range(1,len(some_list)):
        if(some_list[i] >= some_list[i-1]):
            isDescending = False
        elif(some_list[i] <= some_list[i-1]):
            isAscending = False
    if(isAscending):
        return -1
    if(isDescending):
        return 1
    return 0

#Converting the columns to lists and compare the nth elements of each, one at a time
#The columns are guaranteed to be of the same length
col_list = [df[x].to_list() for x in df.columns]
result_list = []
n=0
while n in range(len(col_list[0])):
    tmp_lst = []
    for idx in range(len(col_list)):
        tmp_lst.append(col_list[idx][n])
    result_list.append(isOrdered(tmp_lst))
    n  =1

df['result'] = result_list

this gives me the following DataFrame (It has the problem that it returns -1 instead of 0 if all the values are the same, but I can live with it as long as it accurately tells me whether the columns are Ascending or notAscending)

   col1  col2  col3  result
0     1     4     8      -1
1     2     3     6      -1
2     3     2     4       0
3     4     1     2       0
4     5     0     0       0
5     6    -1    -2       1
6     7    -2    -4       1
7     8    -3    -6       1
8     1     1     1      -1

This approach doesn't look very pretty and I doubt it is efficient. Is there a better way to accomplish this?

CodePudding user response:

You can calculate diff along rows and then check if all diffs in a row are greater or less than 0:

import numpy as np
diff = df.diff(axis=1).iloc[:,1:]
df['result'] = np.where((diff > 0).all(axis=1), -1, np.where((diff < 0).all(axis=1), 1, 0))
df
   col1  col2  col3  result
0     1     4     8      -1
1     2     3     6      -1
2     3     2     4       0
3     4     1     2       0
4     5     0     0       0
5     6    -1    -2       1
6     7    -2    -4       1
7     8    -3    -6       1
8     1     1     1       0

CodePudding user response:

dirs = np.sign(df.diff(-1, axis="columns")).iloc[:, :-1]

df["result"] = np.select([dirs.eq(1).all(axis="columns"),
                          dirs.eq(-1).all(axis="columns")],
                         [1,
                          -1],
                         default=0)
  • get the "directions" for each row

    • this is the sign of the consecutive differences from left to right (diff(-1, "columns"))
    • since there's no next value at the end, chop it up (iloc[:, :-1])
  • if the directions are all equal to 1 => put 1 to the result

  • elsif they are all equal to -1 => put -1 to the result

  • otherwise, put the default value, i.e., 0

(np.select is the vectorized if-elif-..-else.)

to get

>>> df

   col1  col2  col3  result
0     1     4     8      -1
1     2     3     6      -1
2     3     2     4       0
3     4     1     2       0
4     5     0     0       0
5     6    -1    -2       1
6     7    -2    -4       1
7     8    -3    -6       1
8     1     1     1       0

where the dirs was

>>> dirs

   col1  col2
0    -1    -1
1    -1    -1
2     1    -1
3     1    -1
4     1     0
5     1     1
6     1     1
7     1     1
8     0     0
  • Related