Get elements from column array by index in Dataframe Pandas-CodePudding

I have a dataframe:

import pandas as pd
data = {'id':[1,2,3],
            'tokens': [[ 'in', 'the' , 'morning',
                             'cat', 'run', 'today', 'very', 'quick'],['dog', 'eat', 'meat', 'chicken', 'from', 'bowl'],
                            ['mouse', 'hides', 'from', 'a', 'cat']]}
        
df = pd.DataFrame(data)

Also I have a list of lists of indexes.

lst_index = [[3, 4, 5], [0, 1, 2], [2, 3, 4]]

I want to create a column that will contain the elements from the tokens column array. Moreover, the elements are taken by indices from lst_index. So it will be:

    id             tokens                                          new
0   1   [in, the, morning, cat, run, today, very, quick]    [cat, run, today]
1   2   [dog, eat, meat, chicken, from, bowl]               [dog, eat, meat]
2   3   [mouse, hides, from, a, cat]                        [from, a, cat]

CodePudding user response：

Use a simple list comprehension:

lst_index = [[3, 4, 5], [0, 1, 2], [2, 3, 4]]

df['new'] = [[l[i] for i in idx] for idx,l in zip(lst_index, df['tokens'])]

output:

   id                                            tokens                new
0   1  [in, the, morning, cat, run, today, very, quick]  [cat, run, today]
1   2             [dog, eat, meat, chicken, from, bowl]   [dog, eat, meat]
2   3                      [mouse, hides, from, a, cat]     [from, a, cat]

CodePudding user response：

You can traverse both dictionary and list as follows to get the new column:

data = {'id':[1,2,3],
            'tokens': [[ 'in', 'the' , 'morning',
                             'cat', 'run', 'today', 'very', 'quick'],['dog', 'eat', 'meat', 'chicken', 'from', 'bowl'],
                            ['mouse', 'hides', 'from', 'a', 'cat']]}
lst_index = [[3, 4, 5], [0, 1, 2], [2, 3, 4]]
l = []

for i in range(len(data["tokens"])):
    l.append([])
    for j in range(len(lst_index[i])):
        l[i].append(data["tokens"][i][lst_index[i][j]])

data["new"] = l
print(data)

Output:

{'id': [1, 2, 3], 'tokens': [['in', 'the', 'morning', 'cat', 'run', 'today', 'very', 'quick'], ['dog', 'eat', 'meat', 'chicken', 'from', 'bowl'], ['mouse', 'hides', 'from', 'a', 'cat']], 'new': [['cat', 'run', 'today'], ['dog', 'eat', 'meat'], ['from', 'a', 'cat']]}

CodePudding user response：

This is maybe not the most efficient solution but it works:

df['new'] = [[token[i] for i in index] for token, index in zip(df['tokens'], lst_index)]

    id                                        tokens                new
0   1  [in, the, morning, cat, run, today, very, quick]  [cat, run, today]
1   2             [dog, eat, meat, chicken, from, bowl]   [dog, eat, meat]
2   3                      [mouse, hides, from, a, cat]     [from, a, cat]