I am looking for a way to compare multiple columns with each other when the number of columns is not explicitly known.
Specifically, given N columns, I want to create an additional column called 'result'
where the value of each row will be:
1
ifcol(0) > col(1) > col(2) > ... > col(N-1) > col(N)
for that row-1
if the opposite is true (col(0) < col(1) < ... < col(N-1) < col(N)
0
if none of the above is True
For example, using the following dataframe
df = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,1],
'col2': [4,3,2,1,0,-1,-2,-3,1],
'col3': [8,6,4,2,0,-2,-4,-6,1]})
col1 col2 col3
0 1 4 8
1 2 3 6
2 3 2 4
3 4 1 2
4 5 0 0
5 6 -1 -2
6 7 -2 -4
7 8 -3 -6
8 1 1 1
I should get the following result column
col1 col2 col3 result
0 1 4 8 -1
1 2 3 6 -1
2 3 2 4 0
3 4 1 2 0
4 5 0 0 0
5 6 -1 -2 1
6 7 -2 -4 1
7 8 -3 -6 1
8 1 1 1 0
with 3 columns I could simply do
condition1 = (df['col1'] > df['col2']) & (df['col2'] > df['col3'])
condition2 = (df['col1'] < df['col2']) & (df['col2'] < df['col3'])
df['result'] = np.select([condition1,condition2], [1,-1], 0)
the problem is that this quickly becomes awfully inefficient as the number of columns increases.
Something I have tried is converting the columns to lists, get the first element of each list, check if they are descending or ascending (using a function that i found searching for this answer) and then creating a 'results list' from that.
#Checking whether the list given is in Ascending or Descending order
def isOrdered(some_list):
isAscending = True
isDescending = True
for i in range(1,len(some_list)):
if(some_list[i] >= some_list[i-1]):
isDescending = False
elif(some_list[i] <= some_list[i-1]):
isAscending = False
if(isAscending):
return -1
if(isDescending):
return 1
return 0
#Converting the columns to lists and compare the nth elements of each, one at a time
#The columns are guaranteed to be of the same length
col_list = [df[x].to_list() for x in df.columns]
result_list = []
n=0
while n in range(len(col_list[0])):
tmp_lst = []
for idx in range(len(col_list)):
tmp_lst.append(col_list[idx][n])
result_list.append(isOrdered(tmp_lst))
n =1
df['result'] = result_list
this gives me the following DataFrame (It has the problem that it returns -1 instead of 0 if all the values are the same, but I can live with it as long as it accurately tells me whether the columns are Ascending or notAscending)
col1 col2 col3 result
0 1 4 8 -1
1 2 3 6 -1
2 3 2 4 0
3 4 1 2 0
4 5 0 0 0
5 6 -1 -2 1
6 7 -2 -4 1
7 8 -3 -6 1
8 1 1 1 -1
This approach doesn't look very pretty and I doubt it is efficient. Is there a better way to accomplish this?
CodePudding user response:
You can calculate diff along rows and then check if all diffs in a row are greater or less than 0:
import numpy as np
diff = df.diff(axis=1).iloc[:,1:]
df['result'] = np.where((diff > 0).all(axis=1), -1, np.where((diff < 0).all(axis=1), 1, 0))
df
col1 col2 col3 result
0 1 4 8 -1
1 2 3 6 -1
2 3 2 4 0
3 4 1 2 0
4 5 0 0 0
5 6 -1 -2 1
6 7 -2 -4 1
7 8 -3 -6 1
8 1 1 1 0
CodePudding user response:
dirs = np.sign(df.diff(-1, axis="columns")).iloc[:, :-1]
df["result"] = np.select([dirs.eq(1).all(axis="columns"),
dirs.eq(-1).all(axis="columns")],
[1,
-1],
default=0)
get the "directions" for each row
- this is the sign of the consecutive differences from left to right (diff(-1, "columns"))
- since there's no next value at the end, chop it up (iloc[:, :-1])
if the directions are all equal to 1 => put 1 to the result
elsif they are all equal to -1 => put -1 to the result
otherwise, put the default value, i.e., 0
(np.select is the vectorized if-elif-..-else.)
to get
>>> df
col1 col2 col3 result
0 1 4 8 -1
1 2 3 6 -1
2 3 2 4 0
3 4 1 2 0
4 5 0 0 0
5 6 -1 -2 1
6 7 -2 -4 1
7 8 -3 -6 1
8 1 1 1 0
where the dirs
was
>>> dirs
col1 col2
0 -1 -1
1 -1 -1
2 1 -1
3 1 -1
4 1 0
5 1 1
6 1 1
7 1 1
8 0 0