I'm trying to split strings into 2 substrings, based on the following example. How could you split the following strings:
"FORT RANDOM FX5019"
"PE3 5KS SAINT SOMTHING"
"MM 0110 MIMA"
into
["FORT RANDOM", "FX5019"]
["SAINT SOMTHING", "PE3 5KS"]
["MIMA", "MM 0110"]
For the last example, "MM 0110 MIMA"
, one could just find the first "word" that contains letter and split the list from there. So MM 0110 MIMA"
would be split at the end of "0110"
to make the following list: ["MIMA", "MM 0110"]
.
re.split('(\d )',s)
is close to what I want for example but doesn't take the first letters in the substring
I tagged this question as "regex" but it might not be the easiest way to do it? Maybe just iterating over each "word" in the string and finding the first word composed of a mix of letters and numbers would do the trick.
EDIT: If the last example is too hard, just forget it.
I have something that works for the first example but not the second:
def split_by_number(s):
l = s.split(" ")
for idx, word in enumerate(l):
for char in word:
if char.isdigit():
break_idx = idx
print(break_idx)
break
return[" ".join(l[:break_idx]), " ".join(l[break_idx:])]
CodePudding user response:
Maybe I was overthinking this, but try:
import re
strs = ["FORT RANDOM FX5019", "PE3 5KS SAINT SOMTHING", "MM 0110 MIMA", "52 52 CITY", "CITY 152 52"]
for s in strs:
print(re.sub(r'^(?:(?!.*\d\S* [^\d\n] $)([^\d\n] ) )?(. \d\S )(?: (. ))?$', r'\1\3|\2', s).split('|'))
Prints:
['FORT RANDOM', 'FX5019']
['SAINT SOMTHING', 'PE3 5KS']
['MIMA', 'MM 0110']
['CITY', '52 52']
['CITY', '152 52']
This is under the assumption you just want to split each line in exactly two elements. Also, check the online regex demo.