Home > Back-end >  Get the position of an element in a list from a pandas dataframe column
Get the position of an element in a list from a pandas dataframe column

Time:05-04

I have this dataframe:

import pandas as pd
data  = {'day':[18,18,19,19,20],
        'tokens':[['a1','a2','a3','a3'],['a1','a2','a5','a6'],['b1','b2','b3'],['b0','b2','b3'],['c1','c2','c3']]}
df = pd.DataFrame(data)

Output:

    day    tokens
0   18  [a1, a2, a3, a3]
1   18  [a1, a2, a5, a6]
2   19  [b1, b2, b3]
3   19  [b0, b2, b3]
4   20  [c1, c2, c3]

I want to get the position of each element in the list in the column of tokens, and I want to get the position both from the beginning and from the end (to get two positions for each element): Those for element a1 in the first row will be 0 first and 3 from the end (I want the positions to be row by row).

     day    tokens  position
    0   18    a1     0
    1   18    a1     3
    2   18    a2     1
    3   18    a2     2
    4   18    a3     2
    5   18    a3     1
    6   18    a4     3
    7   18    a4     0
...

CodePudding user response:

You can try assign each index from the beginning and from the end to postion column first

df['position'] = df['tokens'].apply(lambda lst: list(zip(range(len(lst)), reversed(range(len(lst))))))
print(df)

   day            tokens                          position
0   18  [a1, a2, a3, a3]  [(0, 3), (1, 2), (2, 1), (3, 0)]
1   18  [a1, a2, a5, a6]  [(0, 3), (1, 2), (2, 1), (3, 0)]
2   19      [b1, b2, b3]          [(0, 2), (1, 1), (2, 0)]
3   19      [b0, b2, b3]          [(0, 2), (1, 1), (2, 0)]
4   20      [c1, c2, c3]          [(0, 2), (1, 1), (2, 0)]

The explode multiple list columns to rows:

df = (df.set_index(['day'])
      .apply(pd.Series.explode).reset_index()
      .drop_duplicates(['tokens', 'position'])
      .apply(pd.Series.explode).reset_index(drop=True)
      .sort_values(['day', 'tokens']))

As Ben.T kindly advised in comment, there is a more easier way to explode multiple list columns in Pandas newer than 1.3.0.

df = (df.explode(['tokens','position'])
      .explode('position')
      .drop_duplicates(['tokens', 'position']))
print(df)

    day tokens position
0    18     a1        0
1    18     a1        3
2    18     a2        1
3    18     a2        2
4    18     a3        2
5    18     a3        1
6    18     a3        3
7    18     a3        0
8    18     a5        2
9    18     a5        1
10   18     a6        3
11   18     a6        0
18   19     b0        0
19   19     b0        2
12   19     b1        0
13   19     b1        2
14   19     b2        1
15   19     b2        1
16   19     b3        2
17   19     b3        0
20   20     c1        0
21   20     c1        2
22   20     c2        1
23   20     c2        1
24   20     c3        2
25   20     c3        0
  • Related