Home > Software engineering >  Extract element within string into nested list
Extract element within string into nested list

Time:12-24

I am working on a routing project. The route looks like this "CNSHG(B)-PAMIT(R)-COCTG(B)-USHOU(R)-COCTG(B)-USMSY" and I want to break it into a nested list. Also, a route contains multiple segments for example CNSHG-PAMIT is one segment transported using B and then PAMIT-COCTG transported using R i.e, Rail, and so on.

Input:

"CNSHG(B)-PAMIT(R)-COCTG(B)-USHOU(R)-COCTG(B)-USMSY"

The output should be like this:

[[CNSHG, PAMIT, B],[PAMIT, COCTG, R],[COCTG, USHOU, B],[USHOU, COCTG, R],[COCTG, USMSY, B]]

I have tried using regex and the below codes but it didn't work.

route.str.extract('(.)\s\((.\d )')

Thanks a lot.

CodePudding user response:

You can use

import pandas as pd
df = pd.DataFrame({'col':["CNSHG(B)-PAMIT(R)-COCTG(B)-USHOU(R)-COCTG(B)-USMSY"]})
df['result'] = df['col'].str.findall(r'(\w )\((?=[^()]*\)-(\w ))([^()]*)\)')

Output of df['result']:

[('CNSHG', 'PAMIT', 'B'), ('PAMIT', 'COCTG', 'R'), ('COCTG', 'USHOU', 'B'), ('USHOU', 'COCTG', 'R'), ('COCTG', 'USMSY', 'B')]

See the regex demo. Details:

  • (\w ) - one or more word chars
  • \( - a ( char
  • (?=[^()]*\)-(\w )) - a positive lookahead that requires (immediately to the right of the current location):
    • [^()]* - zero or more chars other than ( and )
    • \)- - a )- string
    • (\w ) - Group 2: one or more word chars
  • ([^()]*) - Group 3: zero or more chars other than ( and )
  • \) - a ) char.
  • Related