I need to extract 3 different details out of 1 string.
The pattern is:
- "C" followed by 3 digits.
- Character and number of any kind. However, an order of a character followed by a single digit is always the case.
- "S" followed by numbers and can include special characters like "-" and "_".
- However, the last "_" separates an iterator, which can be discarded
- Sometimes there is no second or third element.
Examples:
Input | Expected output
---------------------------------------------------
C001F1S15_08 => ['C001','F1','S15']
C312PH2S1-06_5-0_12 => ['C312','PH2','S1-06_5-0']
C023_05 => ['C023']
C002M5_02 => ['C002','M5']
How can this be done?
All the best
CodePudding user response:
Try this:
(C\d{3})([A-RT-Z\d] )?(S[\d\-_] )?(?:_\d )
Result: https://regex101.com/r/FETn0U/1
CodePudding user response:
The pattern following will do what what you want.We discard the last group.
^(C\d{3})([A-Z] \d)?([-a-zA-Z\d] _[\d-] )?(_\w )?
See https://regex101.com/r/CKasXZ/2
CodePudding user response:
You can extract values like this (using Avinash's regex)
import re
regex = re.compile(r"(C\d{3})([A-RT-Z\d] )?(S[\d\-_] )?(?:_\d )")
text = "C001F1S15_08"
match = regex.match(text)
print(match.group(1)) # C001
print(match.group(2)) # F1
print(match.group(3)) # S15
print(match.groups()) # ('C001', 'F1', 'S15')
print(list(match.groups()[:3])) # ['C001', 'F1', 'S15']
See here for more information. Keep in mind that .group(0)
refers to the entire match, in this case the input string.
CodePudding user response:
import re
lines = ["C001F1S15_08",
"C312PH2S1-06_5-0_12",
"C023_05",
"C002M5_02"]
for line in lines:
parts = line.split("_")
if len(parts) > 1:
parts = parts[:-1]
line = "_".join(parts)
print(line)
print(re.findall("C\d{3}|S[A-Za-z0-9_@./#& -] |[A-Za-z] \d ",line))
CodePudding user response:
result = []
str = ''.join(str.split('_')[:-1]) # For removing values after the last '_'.
result.append(str[0:4]) # for retrieve the 1st part of 4 elements.
for i in re.findall('[\w]{1,2}[0-9-] ', str[4:]): # The regex cut the end after each group of 1 or 2 letters numbers and '-'.
result.append(i) # for retrive each values from the regex otherwise you have a list in a list.
result
I guess you can simplify the loop but i don't know how.