Home > Back-end >  Pandas: Get index of substring within list within column
Pandas: Get index of substring within list within column

Time:04-07

I have a dataframe with a column composed by lists, as below:

      sessionId   split
0      117200  [8=FIX.4.4, 9=401, 35=F, 34=342375]
1      117200  [8=FIX.4.4, 9=454, 35=G, 34=342374]
2      117200  [8=FIX.4.4, 9=430, 35=G, 34=342373]
3      173335  [8=FIX.4.4, 9=444, 35=G, 34=272236]
4      133911  [8=FIX.4.4, 9=359, 35=G, 34=25355]

I'd like to retrieve the index of the list in which the substring '35=' appears. The expected result would be like:

      sessionId   split                             idx
0      117200  [8=FIX.4.4, 9=401, 35=F, 34=342375]  2
1      117200  [8=FIX.4.4, 9=454, 35=G, 34=342374]  2
2      117200  [8=FIX.4.4, 9=430, 35=G, 34=342373]  2
3      173335  [8=FIX.4.4, 9=444, 35=G, 34=272236]  2
4      133911  [8=FIX.4.4, 9=359, 35=G, 34=25355]   2

CodePudding user response:

Assuming a list of string, the most efficient is likely to use a list comprehension:

df['idx'] = [next((x for x in range(len(l)) if '35=' in l[x]), None)
             for l in df['split']]

output:

   sessionId                                split  idx
0     117200  [8=FIX.4.4, 9=401, 35=F, 34=342375]    2
1     117200  [8=FIX.4.4, 9=454, 35=G, 34=342374]    2
2     117200  [8=FIX.4.4, 9=430, 35=G, 34=342373]    2
3     173335  [8=FIX.4.4, 9=444, 35=G, 34=272236]    2
4     133911   [8=FIX.4.4, 9=359, 35=G, 34=25355]    2

used input:

df = pd.DataFrame({'sessionId': [117200, 117200, 117200, 173335, 133911],
                   'split': [['8=FIX.4.4', '9=401', '35=F', '34=342375'],
                             ['8=FIX.4.4', '9=454', '35=G', '34=342374'],
                             ['8=FIX.4.4', '9=430', '35=G', '34=342373'],
                             ['8=FIX.4.4', '9=444', '35=G', '34=272236'],
                             ['8=FIX.4.4', '9=359', '35=G', '34=25355']]})
  • Related