Suppose I have a dataframe like this
0 5 10 15 20 25 ...
action_0_Q0 0.299098 0.093973 0.761735 0.058112 0.013463 0.164322 ...
action_0_Q1 0.463095 0.468425 0.202679 0.742424 0.865005 0.479546 ...
action_0_Q2 0.237807 0.437602 0.035587 0.199465 0.121532 0.356132 ...
action_1_Q0 0.263191 0.176407 0.471295 0.082457 0.029566 0.426428 ...
action_1_Q1 0.508573 0.490355 0.431732 0.249432 0.189732 0.396947 ...
action_1_Q2 0.228236 0.333238 0.096973 0.668111 0.780702 0.176625 ...
action_2_Q0 0.256632 0.122589 0.495720 0.059918 0.824424 0.384998 ...
action_2_Q1 0.485362 0.462969 0.420790 0.211578 0.155771 0.186493 ...
action_2_Q2 0.258006 0.414442 0.083490 0.728504 0.019805 0.428509 ...
This dataframe may be very large (a lot of rows, about 3000 columns). What I have to do is to apply a function to each column, which in turn returns a distance matrix. However, such function should be applied by considering 3 rows at once. For example, taking the first column:
a = distance_function([[0.299098, 0.463095, 0.237807], [0.263191, 0.508573, 0.228236], [0.256632, 0.485362, 0.258006]])
# Returns
print(a.shape) -> (3,3)
Now, this is not overly complicated via a for loop, but the time required would be huge. Is there some alternative way?
CodePudding user response:
IIUC use:
df = df.apply(lambda x: distance_function(x.to_numpy().reshape(-1,3)))
If need flatten values:
from itertools import chain
df = df.apply(lambda x: list(chain.from_iterable(distance_function(x.to_numpy().reshape(-1,3))))