Home > OS >  Regex: Help to find multiple values in string (Python)
Regex: Help to find multiple values in string (Python)

Time:05-05

I need to extract 3 different details out of 1 string.

The pattern is:

  1. "C" followed by 3 digits.
  2. Character and number of any kind. However, an order of a character followed by a single digit is always the case.
  3. "S" followed by numbers and can include special characters like "-" and "_".
  4. However, the last "_" separates an iterator, which can be discarded
  5. Sometimes there is no second or third element.

Examples:

Input                   |      Expected output
---------------------------------------------------
C001F1S15_08            =>     ['C001','F1','S15']
C312PH2S1-06_5-0_12     =>     ['C312','PH2','S1-06_5-0']
C023_05                 =>     ['C023']
C002M5_02               =>     ['C002','M5']

How can this be done?

All the best

CodePudding user response:

Try this:

(C\d{3})([A-RT-Z\d] )?(S[\d\-_] )?(?:_\d )

Result: https://regex101.com/r/FETn0U/1

CodePudding user response:

The pattern following will do what what you want.We discard the last group.

^(C\d{3})([A-Z] \d)?([-a-zA-Z\d] _[\d-] )?(_\w )?

See https://regex101.com/r/CKasXZ/2

CodePudding user response:

You can extract values like this (using Avinash's regex)

import re

regex = re.compile(r"(C\d{3})([A-RT-Z\d] )?(S[\d\-_] )?(?:_\d )")
text = "C001F1S15_08"
match = regex.match(text)
print(match.group(1))   # C001
print(match.group(2))   # F1
print(match.group(3))   # S15
print(match.groups())   # ('C001', 'F1', 'S15')
print(list(match.groups()[:3])) # ['C001', 'F1', 'S15']

See here for more information. Keep in mind that .group(0) refers to the entire match, in this case the input string.

CodePudding user response:

import re

lines = ["C001F1S15_08",          
"C312PH2S1-06_5-0_12",
"C023_05",               
"C002M5_02"]

for line in lines:
    parts = line.split("_")

    if len(parts) > 1:
        parts = parts[:-1]
    
    line = "_".join(parts)
    print(line)

    print(re.findall("C\d{3}|S[A-Za-z0-9_@./#& -] |[A-Za-z] \d ",line))

CodePudding user response:

result = []
str = ''.join(str.split('_')[:-1]) # For removing values after the last '_'.
result.append(str[0:4]) # for retrieve the 1st part of 4 elements.
for i in re.findall('[\w]{1,2}[0-9-] ', str[4:]): # The regex cut the end after each group of 1 or 2 letters   numbers and '-'. 
    result.append(i) # for retrive each values from the regex otherwise you have a list in a list.
result

I guess you can simplify the loop but i don't know how.

  • Related