I would like to split a series that has multiple values with file paths and jar file as a delimitator. How can split the values into different rows so that the '.jar' delimitator is not lost
Ex: 1 /opt/abc/defg/first.jar/opt/dce/efg/second.jar/opt/xyz/prs/third.jar
Expected result: 1 /opt/abc/defg/first.jar
2 /opt/dce/efg/second.jar
3 /opt/xyz/prs/third.jar
Thanks
CodePudding user response:
Try str.split
with a positive lookbehind assertion
>>> df['path'].str.split('(?<=\.jar)').str[:-1].explode()
0 /opt/abc/defg/first.jar
0 /opt/dce/efg/second.jar
0 /opt/xyz/prs/third.jar
Name: path, dtype: object
CodePudding user response:
You can use .str.extractall
, using the pattern '(.*?\.jar)'
import pandas as pd
s = pd.Series(['/opt/abc/defg/first.jar/opt/dce/efg/second.jar/opt/xyz/prs/third.jar'])
s.str.extractall('(.*?\.jar)')
0
match
0 0 /opt/abc/defg/first.jar
1 /opt/dce/efg/second.jar
2 /opt/xyz/prs/third.jar
CodePudding user response:
You can add ".jar"
after the split.
value = "/opt/abc/defg/first.jar/opt/dce/efg/second.jar/opt/xyz/prs/third.jar"
results = [i ".jar" for i in value.split(".jar") if i != ""]
print(results)
Output:
['/opt/abc/defg/first.jar', '/opt/dce/efg/second.jar', '/opt/xyz/prs/third.jar']