How to get specific index of np.array of np.arrays fast-CodePudding

At the most basic I have the following dataframe:

a = {'possibility' : np.array([1,2,3])}
b = {'possibility' : np.array([4,5,6])}

df = pd.DataFrame([a,b])

This gives me a dataframe of size 2x1: like so:

row 1:  np.array([1,2,3])
row 2:  np.array([4,5,6])

I have another vector of length 2. Like so:

[1,2]

These represent the index I want from each row.
So if I have [1,2] I want: from row 1: 2, and from row 2: 6. Ideally, my output is [2,6] in a vector form, of length 2.

Is this possible? I can easily run through a for loop, but am looking for FAST approaches, ideally vectors approaches since it is already in pandas/numpy.

For actual use case approximations, I am looking to make this work in the 300k-400k row ranges. And need to run it in optimization problems (hence the fast part)

CodePudding user response：

You could transform to a multi-dimensional numpy array and take_along_axis:

v = np.array([1,2])
a = np.vstack(df['possibility'])
np.take_along_axis(a.T, v[None], axis=0)[0]

output: array([2, 6])

CodePudding user response：

You can use enumerate on the list to create a list of tuples: idx (which will be used to index a MultiIndex Series). Then create a DataFrame from the "possibility" column, stack it; this creates a MultiIndex Series. Use idx to filter the wanted data.

idx = [*enumerate([1,2])]
out = pd.DataFrame(df['possibility'].tolist()).stack()[idx].to_numpy()

Output:

array([2, 6])