Home > OS >  Python Split String Between Sub-strings Where May be Different Starting Sub-strings
Python Split String Between Sub-strings Where May be Different Starting Sub-strings

Time:10-14

I have strings that look something like this:

'T1 Test 2 Sku Red Widget at 10.0'

To extract 'Red Widget' I have been using the code below:

s = 'T1 Test 2 Sku Red Widget at 10.0'
t = s[s.find('Sku ') 4 : s.find(' at')]
print(t)

This worked fine but now the string inputs have changed so that they may contain either 'Sku' (the starting sub-string) or 'Id'.

This code obviously won't work when 'Id' is used so how can I adapt it to capture both scenarios?

CodePudding user response:

One way to do this would be with regex:

import re

s1 = 'T1 Test 2 Sku Red Widget at 10.0'
s2 = 'T1 Test 2 Id Red Widget at 10.0'

pat = '(?:(?<=Sku\s)|(?<=Id\s)).*(?=\sat)'
print(re.search(pat,s1).group(0)) # returns Red Widget
print(re.search(pat,s2).group(0)) # also returns Red Widget

How does this work?

We make use of lookbehinds and lookaheads. The first set of expressions in the regex specify that we should look for text that is preceded either by 'Sku' or by 'Id' followed by a space. The second set does the same but looking ahead instead, for a space followed by 'at'. Whatever matches these conditions is extracted by the function.

CodePudding user response:

You could always add an if statement in there:

if 'Sku ' in s:
  start_substring = 'Sku '
  offset = 4
else:
  start_substring = 'Id '
  offset = 3

t = s[s.find(start_substring) offset : s.find(' at')]
print(t)

CodePudding user response:

You can also do it like this

import re 
s = 'T1 Test 2 Sku Red Widget at 10.0' # or input string 
re_pattern = "Red Widget"
regex = re.compile(re_pattern)
for m in regex.finditer(s): 
    print( m.group()) 
  • Related