Home > Software engineering >  Grouping and regex
Grouping and regex

Time:09-28

I am trying to isolate the first and the fourth element in the string

string = ['Runs_WithWolves || Sat Mar 21 09:38:12  0000 2020 || Mid December 2019 two friends of mine had caught COVID-19 through work colleagues traveling from Muhan China, but i\xe2\x80\xa6 || 0.7188042218 || false fact or prevention\n']

So that it would only return Runs_WithWolves and 0.7188042218.

For now, I have this, but its not working:

pattern = "(. )(?:\s\|\|\s. )(?:\s\|\|\s. \s\|\|\s)(. )(?:\s\|\|\s. )\n"
for string1 in string:
      print(re.findall(str(pattern), string1))

CodePudding user response:

Consider using re.split as follows

import re
string = "Runs_WithWolves || Sat Mar 21 09:38:12  0000 2020 || Mid December 2019 two friends of mine had caught COVID-19 through work colleagues traveling from Muhan China, but i\xe2\x80\xa6 || 0.7188042218 || false fact or prevention\n"
parts = re.split(r'\s*\|\|\s*', string)
print(parts[0])  # Runs_WithWolves
print(parts[3])  # 0.7188042218

CodePudding user response:

You do not need Regex, you can try split -

string= 'Runs_WithWolves || Sat Mar 21 09:38:12  0000 2020 || Mid December 2019 two friends of mine had caught COVID-19 through work colleagues traveling from Muhan China, but i\xe2\x80\xa6 || 0.7188042218 || false fact or prevention\n'

new = string.split('||')

print(new[0]) # First Value

print(new[3]) # Fourth Value

If you do wanna use Regex -

import re

string= 'Runs_WithWolves || Sat Mar 21 09:38:12  0000 2020 || Mid December 2019 two friends of mine had caught COVID-19 through work colleagues traveling from Muhan China, but i\xe2\x80\xa6 || 0.7188042218 || false fact or prevention\n'


new = (re.split(r'\s \|\|\s ', string))

print(new[0])
print(new[3])

I recommend the way without Regex because it is much easier.

CodePudding user response:

You may try that

^(.*?)(?:\|{2}[^|] ){2}\|{2}(.*?)(?:\|{2}|$)
  • 1st capturing group contains your first result
  • 2nd capturing group contains your fourth result

explanation:

  1. .*? match anything until it reaches ||
  2. (?:\|{2}[^|] ){2} Match 2 occurrence of || and the associated result.
  3. \|{2} Matches the third pair of pipe
  4. (.*?) matches the fourth result
  5. (?:\|{2}|$) it matches the 4th pair of pipe or end of sting.
  6. For the first and 4th results, I have enclosed those with capturing groups.
  7. ?: means non capturing group.
  8. ^ Beginning of a string
  9. $ End of String

Source Code ( Run Here ):

import re

regex = r"^([^|] )(?:\|\|[^|] ){2}\|\|([^|] )"

test_str = ("Runs_WithWolves || Sat Mar 21 09:38:12  0000 2020 || Mid December 2019 two friends of mine had caught COVID-19 through work colleagues traveling from Muhan China, but i\\xe2\\x80\\xa6 || 0.7188042218 || false fact or prevention\n")

matches = re.finditer(regex, test_str, re.MULTILINE)

for match in matches:
    print(match.group(1))
    print(match.group(2))  

Demo

  • Related