Find words in array and get their indexes in Dataframe in Pandas-CodePudding

I have a dataframe :

import pandas as pd
data = {'token_1': [['cat', 'run','today'],['dog', 'eat', 'meat']],
        'token_2': [[ 'in', 'the' , 'morning','cat', 'run', 'today',
                      'very', 'quick'],['dog', 'eat', 'meat', 'chicken', 'from', 'bowl']]}

df = pd.DataFrame(data)

I need to find words from column token_1 in token_2 and get their indixes in an array. Then get a list of indexes for each line, i expected this:

lst_indexes = [[3,4,5],
                [0,1,2]]

CodePudding user response：

Use list comprehension with enumerate for indices:

L = [[i for i, x in enumerate(b) if x in a] for a, b in zip(df['token_1'], df['token_2'])]
print (L)
[[3, 4, 5], [0, 1, 2]]

CodePudding user response：

You can use a dictionary/list comprehension:

# first compute a dictionary of indices for efficiency
indices = [{w: i for i,w in enumerate(l)} for l in df['token_2']]

# then map the indices
[[d.get(x,None) for x in l] for d, l in zip(indices, df['token_1'])]

output:

[[3, 4, 5], [0, 1, 2]]

CodePudding user response：

You can traverse data dictionary and append values to a new list:

data = {'token_1': [['cat', 'run','today'],['dog', 'eat', 'meat']],
        'token_2': [[ 'in', 'the' , 'morning','cat', 'run', 'today',
                      'very', 'quick'],['dog', 'eat', 'meat', 'chicken', 'from', 'bowl']]}

l = []
for i in range(len(data["token_1"])):
    l.append([])
    for j in range(len(data["token_1"][i])):
        a = data["token_2"][i].index(data["token_1"][i][j])
        if a!=-1:
            l[i].append(a)
print(l)

Note that the other solutions look much more clear and readable, this is only an alternative to list comprehension

Output:

[[3, 4, 5], [0, 1, 2]]