I have this dataframe:
import pandas as pd
data = {'day':[18,18,19,19,20],
'tokens':[['a1','a2','a3','a3'],['a1','a2','a5','a6'],['b1','b2','b3'],['b0','b2','b3'],['c1','c2','c3']]}
df = pd.DataFrame(data)
Output:
day tokens
0 18 [a1, a2, a3, a3]
1 18 [a1, a2, a5, a6]
2 19 [b1, b2, b3]
3 19 [b0, b2, b3]
4 20 [c1, c2, c3]
I want to get the position of each element in the list in the column of tokens
, and I want to get the position both from the beginning and from the end (to get two positions for each element):
Those for element a1
in the first row will be 0 first and 3 from the end (I want the positions to be row by row).
day tokens position
0 18 a1 0
1 18 a1 3
2 18 a2 1
3 18 a2 2
4 18 a3 2
5 18 a3 1
6 18 a4 3
7 18 a4 0
...
CodePudding user response:
You can try assign each index from the beginning and from the end to postion
column first
df['position'] = df['tokens'].apply(lambda lst: list(zip(range(len(lst)), reversed(range(len(lst))))))
print(df)
day tokens position
0 18 [a1, a2, a3, a3] [(0, 3), (1, 2), (2, 1), (3, 0)]
1 18 [a1, a2, a5, a6] [(0, 3), (1, 2), (2, 1), (3, 0)]
2 19 [b1, b2, b3] [(0, 2), (1, 1), (2, 0)]
3 19 [b0, b2, b3] [(0, 2), (1, 1), (2, 0)]
4 20 [c1, c2, c3] [(0, 2), (1, 1), (2, 0)]
The explode multiple list columns to rows:
df = (df.set_index(['day'])
.apply(pd.Series.explode).reset_index()
.drop_duplicates(['tokens', 'position'])
.apply(pd.Series.explode).reset_index(drop=True)
.sort_values(['day', 'tokens']))
As Ben.T kindly advised in comment, there is a more easier way to explode multiple list columns in Pandas newer than 1.3.0.
df = (df.explode(['tokens','position'])
.explode('position')
.drop_duplicates(['tokens', 'position']))
print(df)
day tokens position
0 18 a1 0
1 18 a1 3
2 18 a2 1
3 18 a2 2
4 18 a3 2
5 18 a3 1
6 18 a3 3
7 18 a3 0
8 18 a5 2
9 18 a5 1
10 18 a6 3
11 18 a6 0
18 19 b0 0
19 19 b0 2
12 19 b1 0
13 19 b1 2
14 19 b2 1
15 19 b2 1
16 19 b3 2
17 19 b3 0
20 20 c1 0
21 20 c1 2
22 20 c2 1
23 20 c2 1
24 20 c3 2
25 20 c3 0