Home > database >  python regex keep text between the last two occurrences of a character
python regex keep text between the last two occurrences of a character

Time:07-29

As the title says, I want to extract the text between the last two ocurrences of a character in a string.

I have:

'9500 anti-Xa IU/ml - 0,6 ml 5700 IU -'
'120 mg/ml – 0.165 ml -'
'300-300-300 IR/ml  or  IC/ml - 10 ml -'
'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'

I want to have:

'0,6 ml 5700 IU'
'0.165 ml'
'10 ml'
'15 g'

I tried using -\s*.*- but it matches everything between first and last -. What's the correct regex to use?

CodePudding user response:

With search:

import re
[re.search(r'[-–]\s*([^-–] ?)\s*[-–][^-–]*$', x).group(1) for x in l]

Or split:

[re.split(r'\s [-–]\s*', x, 2)[-2] for x in l]

output: ['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']

used input:

l = ['9500 anti-Xa IU/ml - 0,6 ml 5700 IU -',
     '120 mg/ml – 0.165 ml -',
     '300-300-300 IR/ml  or  IC/ml - 10 ml -',
     'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'
    ]

regex demo

CodePudding user response:

You can use

[^-–—\s][^-–—]*?(?=\s*[-–—][^-–—]*$)

See the regex demo. Details:

  • [^-–—\s] - a char other than whitespace, -, and
  • [^-–—]*? - zero or more chars other than -, and as few as possible
  • (?=\s*[-–—][^-–—]*$) - a positive lookahead that requires zero or more whitespaces, then a -, or char and then zero or more chars other than -, and till end of string immediately to the right of the current location.

CodePudding user response:

With your shown samples Only. Please try following regex with Python code, written and tested in Python3. Here is the Online demo for used regex.

import re

var="""9500 anti-Xa IU/ml - 0,6 ml 5700 IU -
120 mg/ml - 0.165 ml -
300-300-300 IR/ml  or  IC/ml - 10 ml -
Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -"""

[x.strip(' ') for x in re.findall(r'(?<=\s-|\s–)(.*?)(?=-)',var,re.M)]

Output will be as follows:

['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']

Explanation: Simple explanation would be, using Python3's re module's findall function. Where I am using regex r'(?<=\s-|\s–)(.*?)(?=-)' to get the required output. Then removing all leading and trailing spaces with strip function from it to get expected output.

CodePudding user response:

Try to also match the blank space before the last dash -:

\s\-\s(.*)\s\-

By the way, maybe regex101 website could help you next time you have a new regex issue.

EDIT

I just see that you have two types of dash symbols! Short - and long . Try this regex instead:

\s[-–]\s(.*)\s[-–]

  • Related