I ultimately want to split a string by a certain character. I tried Regex, but it started escaping \
, so I want to avoid that with another approach (all the attempts at unescaping the string failed). So, I want to get all positions of a character char
in a string that is not within quotes, so I can split them up accordingly.
For example, given the phase hello-world:la\test
, I want to get back 11
if char
is :
, as that is the only :
in the string, and it is in the 11th index. However, re
does split it, but I get ['hello-world
,lat\\test']
.
EDIT
:
@BoarGules made me realize that re
didn't actually change anything, but it's just how Python displays slashes.
CodePudding user response:
Here's a function that works:
def split_by_char(string,char=':'):
PATTERN = re.compile(rf'''((?:[^\{char}"']|"[^"]*"|'[^']*') )''')
return [string[m.span()[0]:m.span()[1]] for m in PATTERN.finditer(string)]
CodePudding user response:
string = 'hello-world:la\test'
char = ':'
print(string.find(char))
Prints
11
char_index = string.find(char)
string[:char_index]
Returns
'hello-world'
string[char_index 1:]
Returns
'la\test'
CodePudding user response:
Solution for the case you're likely encountering (a pseudo-CSV format you're hand-rolling a parser for; if you're not in that situation, it's still a likely situation for people finding this question later):
Just use the csv
module.
import csv
import io
test_strings = ['field1:field2:field3', 'field1:"field2:with:embedded:colons":field3']
for s in test_strings:
for row in csv.reader(io.StringIO(s), delimiter=':'):
print(row)
which outputs:
['field1', 'field2', 'field3']
['field1', 'field2:with:embedded:colons', 'field3']
correctly ignoring the colons within the quoted field, requiring no kludgy, hard-to-verify hand-written regexes.