I am trying to isolate the first and the fourth element in the string
string = ['Runs_WithWolves || Sat Mar 21 09:38:12 0000 2020 || Mid December 2019 two friends of mine had caught COVID-19 through work colleagues traveling from Muhan China, but i\xe2\x80\xa6 || 0.7188042218 || false fact or prevention\n']
So that it would only return Runs_WithWolves
and 0.7188042218
.
For now, I have this, but its not working:
pattern = "(. )(?:\s\|\|\s. )(?:\s\|\|\s. \s\|\|\s)(. )(?:\s\|\|\s. )\n"
for string1 in string:
print(re.findall(str(pattern), string1))
CodePudding user response:
Consider using re.split
as follows
import re
string = "Runs_WithWolves || Sat Mar 21 09:38:12 0000 2020 || Mid December 2019 two friends of mine had caught COVID-19 through work colleagues traveling from Muhan China, but i\xe2\x80\xa6 || 0.7188042218 || false fact or prevention\n"
parts = re.split(r'\s*\|\|\s*', string)
print(parts[0]) # Runs_WithWolves
print(parts[3]) # 0.7188042218
CodePudding user response:
You do not need Regex, you can try split
-
string= 'Runs_WithWolves || Sat Mar 21 09:38:12 0000 2020 || Mid December 2019 two friends of mine had caught COVID-19 through work colleagues traveling from Muhan China, but i\xe2\x80\xa6 || 0.7188042218 || false fact or prevention\n'
new = string.split('||')
print(new[0]) # First Value
print(new[3]) # Fourth Value
If you do wanna use Regex -
import re
string= 'Runs_WithWolves || Sat Mar 21 09:38:12 0000 2020 || Mid December 2019 two friends of mine had caught COVID-19 through work colleagues traveling from Muhan China, but i\xe2\x80\xa6 || 0.7188042218 || false fact or prevention\n'
new = (re.split(r'\s \|\|\s ', string))
print(new[0])
print(new[3])
I recommend the way without Regex because it is much easier.
CodePudding user response:
You may try that
^(.*?)(?:\|{2}[^|] ){2}\|{2}(.*?)(?:\|{2}|$)
- 1st capturing group contains your first result
- 2nd capturing group contains your fourth result
explanation:
.*?
match anything until it reaches||
(?:\|{2}[^|] ){2}
Match 2 occurrence of || and the associated result.\|{2}
Matches the third pair of pipe(.*?)
matches the fourth result(?:\|{2}|$)
it matches the 4th pair of pipe or end of sting.- For the first and 4th results, I have enclosed those with capturing groups.
?:
means non capturing group.^
Beginning of a string$
End of String
Source Code ( Run Here ):
import re
regex = r"^([^|] )(?:\|\|[^|] ){2}\|\|([^|] )"
test_str = ("Runs_WithWolves || Sat Mar 21 09:38:12 0000 2020 || Mid December 2019 two friends of mine had caught COVID-19 through work colleagues traveling from Muhan China, but i\\xe2\\x80\\xa6 || 0.7188042218 || false fact or prevention\n")
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
print(match.group(1))
print(match.group(2))