At the most basic I have the following dataframe:
a = {'possibility' : np.array([1,2,3])}
b = {'possibility' : np.array([4,5,6])}
df = pd.DataFrame([a,b])
This gives me a dataframe of size 2x1: like so:
row 1: np.array([1,2,3])
row 2: np.array([4,5,6])
I have another vector of length 2. Like so:
[1,2]
These represent the index I want from each row.
So if I have [1,2] I want: from row 1: 2, and from row 2: 6.
Ideally, my output is [2,6] in a vector form, of length 2.
Is this possible? I can easily run through a for loop, but am looking for FAST approaches, ideally vectors approaches since it is already in pandas/numpy.
For actual use case approximations, I am looking to make this work in the 300k-400k row ranges. And need to run it in optimization problems (hence the fast part)
CodePudding user response:
You could transform to a multi-dimensional numpy array and take_along_axis
:
v = np.array([1,2])
a = np.vstack(df['possibility'])
np.take_along_axis(a.T, v[None], axis=0)[0]
output: array([2, 6])
CodePudding user response:
You can use enumerate
on the list to create a list of tuples: idx
(which will be used to index a MultiIndex Series). Then create a DataFrame from the "possibility" column, stack
it; this creates a MultiIndex Series. Use idx
to filter the wanted data.
idx = [*enumerate([1,2])]
out = pd.DataFrame(df['possibility'].tolist()).stack()[idx].to_numpy()
Output:
array([2, 6])