I am working on a routing project. The route looks like this "CNSHG(B)-PAMIT(R)-COCTG(B)-USHOU(R)-COCTG(B)-USMSY"
and I want to break it into a nested list. Also, a route contains multiple segments for example CNSHG-PAMIT is one segment transported using B and then PAMIT-COCTG transported using R i.e, Rail, and so on.
Input:
"CNSHG(B)-PAMIT(R)-COCTG(B)-USHOU(R)-COCTG(B)-USMSY"
The output should be like this:
[[CNSHG, PAMIT, B],[PAMIT, COCTG, R],[COCTG, USHOU, B],[USHOU, COCTG, R],[COCTG, USMSY, B]]
I have tried using regex and the below codes but it didn't work.
route.str.extract('(.)\s\((.\d )')
Thanks a lot.
CodePudding user response:
You can use
import pandas as pd
df = pd.DataFrame({'col':["CNSHG(B)-PAMIT(R)-COCTG(B)-USHOU(R)-COCTG(B)-USMSY"]})
df['result'] = df['col'].str.findall(r'(\w )\((?=[^()]*\)-(\w ))([^()]*)\)')
Output of df['result']
:
[('CNSHG', 'PAMIT', 'B'), ('PAMIT', 'COCTG', 'R'), ('COCTG', 'USHOU', 'B'), ('USHOU', 'COCTG', 'R'), ('COCTG', 'USMSY', 'B')]
See the regex demo. Details:
(\w )
- one or more word chars\(
- a(
char(?=[^()]*\)-(\w ))
- a positive lookahead that requires (immediately to the right of the current location):[^()]*
- zero or more chars other than(
and)
\)-
- a)-
string(\w )
- Group 2: one or more word chars
([^()]*)
- Group 3: zero or more chars other than(
and)
\)
- a)
char.