I have three columns with thousands of rows. Numbers in column 1 and 2 change from 1 to 6. I desire to check combinations of numbers in both column 1 and 2 to divide the value in column 3 by a certain value.
1 2 3.036010
1 3 2.622544
3 1 2.622544
1 2 3.036010
2 1 3.036010
Further, column 3 will be divided by same number if values of column 1 and column 2 are swapped. For example, for 1 2 and 2 1 combinations, column 3 may be divided by same value. My present approach does the job, but I would have to write several conditions manually. What could be more efficient way to perform this task? Thanks in advance!
my_data = np.loadtxt('abc.dat')
for row in my_data:
if row[0] == 1 and row[1] == 2:
row[3]/some_value
CodePudding user response:
You could use a mask for this:
import numpy as np
my_data = np.column_stack([np.random.randint(1, 6, (1000, 2)), np.random.randn(1000)])
some_value = 123
mask = my_data[:, 0] == my_data[:, 1]
# divide
my_data[mask, 2] /= some_value
output in my_data
CodePudding user response:
Maybe using pandas is more suitable for this task, you can define conditions and apply them to tabular data without any explicit loop.
CodePudding user response:
Numpy offers np.where
which allows for vectorized test:
result = np.where(data[:, 0] == data[:, 1], data[:, 2]/some_value, data[:, 2])
or if you want to change the array in place:
data[:, 2] = np.where(data[:, 0] == data[:, 1], data[:, 2]/some_value, data[:, 2])
CodePudding user response:
If you want to combine some conditions like your code. you can use operator &
for and or |
for or in np.where
:
cond1 = my_data[:, 0] == 1 # cond is a masked Boolean array for where the first condition is satisfied
cond2 = my_data[:, 1] == 2
some_value = 10
indices = np.where(cond1 & cond2)[0] # it gets indices for where the two conditions are satisfied
# indices = np.where(cond1 | cond2)[0] # it gets indices for where at least one of the masks is satisfied
result = my_data[:, 2][indices] / some_value # operation is done on the specified indices
and if you want to modify the 2nd column in place, as Ballesta answer
my_data[:, 2][indices] = my_data[:, 2][indices] / some_value
np.logical_and
and np.logical_or
are the other modules that can handle these such conditions, too; These modules must be used as np.logical_and.reduce
and np.logical_or.reduce
if conditions are more than two.