Split text in Pandas and keep the delimiter-CodePudding

I would like to split a series that has multiple values with file paths and jar file as a delimitator. How can split the values into different rows so that the '.jar' delimitator is not lost

Ex: 1 /opt/abc/defg/first.jar/opt/dce/efg/second.jar/opt/xyz/prs/third.jar

Expected result: 1 /opt/abc/defg/first.jar

2 /opt/dce/efg/second.jar

3 /opt/xyz/prs/third.jar

Thanks

CodePudding user response：

Try str.split with a positive lookbehind assertion

>>> df['path'].str.split('(?<=\.jar)').str[:-1].explode()

0    /opt/abc/defg/first.jar
0    /opt/dce/efg/second.jar
0     /opt/xyz/prs/third.jar
Name: path, dtype: object

CodePudding user response：

You can use .str.extractall, using the pattern '(.*?\.jar)'

import pandas as pd

s = pd.Series(['/opt/abc/defg/first.jar/opt/dce/efg/second.jar/opt/xyz/prs/third.jar'])
s.str.extractall('(.*?\.jar)')

                               0
  match                         
0 0      /opt/abc/defg/first.jar
  1      /opt/dce/efg/second.jar
  2       /opt/xyz/prs/third.jar

CodePudding user response：

You can add ".jar" after the split.

value = "/opt/abc/defg/first.jar/opt/dce/efg/second.jar/opt/xyz/prs/third.jar"
results = [i   ".jar" for i in value.split(".jar") if i != ""]
print(results)

Output:

['/opt/abc/defg/first.jar', '/opt/dce/efg/second.jar', '/opt/xyz/prs/third.jar']