How to efficiently check conditions on two columns and perform operation on third column in python-CodePudding

I have three columns with thousands of rows. Numbers in column 1 and 2 change from 1 to 6. I desire to check combinations of numbers in both column 1 and 2 to divide the value in column 3 by a certain value.

1     2    3.036010    
1     3    2.622544    
3     1    2.622544    
1     2    3.036010    
2     1    3.036010

Further, column 3 will be divided by same number if values of column 1 and column 2 are swapped. For example, for 1 2 and 2 1 combinations, column 3 may be divided by same value. My present approach does the job, but I would have to write several conditions manually. What could be more efficient way to perform this task? Thanks in advance!

my_data = np.loadtxt('abc.dat')

for row in my_data:    
    if row[0] == 1 and row[1] == 2:
        row[3]/some_value

CodePudding user response：

You could use a mask for this:

import numpy as np
my_data = np.column_stack([np.random.randint(1, 6, (1000, 2)), np.random.randn(1000)])
some_value = 123

mask = my_data[:, 0] == my_data[:, 1]
# divide 
my_data[mask, 2] /= some_value

output in my_data

CodePudding user response：

Maybe using pandas is more suitable for this task, you can define conditions and apply them to tabular data without any explicit loop.

CodePudding user response：

Numpy offers np.where which allows for vectorized test:

result = np.where(data[:, 0] == data[:, 1], data[:, 2]/some_value, data[:, 2])

or if you want to change the array in place:

data[:, 2] = np.where(data[:, 0] == data[:, 1], data[:, 2]/some_value, data[:, 2])

CodePudding user response：

If you want to combine some conditions like your code. you can use operator & for and or | for or in np.where:

cond1 = my_data[:, 0] == 1                    # cond is a masked Boolean array for where the first condition is satisfied
cond2 = my_data[:, 1] == 2
some_value = 10
indices = np.where(cond1 & cond2)[0]          # it gets indices for where the two conditions are satisfied
# indices = np.where(cond1 | cond2)[0]        # it gets indices for where at least one of the masks is satisfied
result = my_data[:, 2][indices] / some_value  # operation is done on the specified indices

and if you want to modify the 2nd column in place, as Ballesta answer

my_data[:, 2][indices] = my_data[:, 2][indices] / some_value

np.logical_and and np.logical_or are the other modules that can handle these such conditions, too; These modules must be used as np.logical_and.reduce and np.logical_or.reduce if conditions are more than two.