Home > Mobile >  How to find the indexes of certain character not in quotes in Python?
How to find the indexes of certain character not in quotes in Python?

Time:04-08

I ultimately want to split a string by a certain character. I tried Regex, but it started escaping \, so I want to avoid that with another approach (all the attempts at unescaping the string failed). So, I want to get all positions of a character char in a string that is not within quotes, so I can split them up accordingly.

For example, given the phase hello-world:la\test, I want to get back 11 if char is :, as that is the only : in the string, and it is in the 11th index. However, re does split it, but I get ['hello-world,lat\\test'].

EDIT: @BoarGules made me realize that re didn't actually change anything, but it's just how Python displays slashes.

CodePudding user response:

Here's a function that works:

def split_by_char(string,char=':'):
    PATTERN = re.compile(rf'''((?:[^\{char}"']|"[^"]*"|'[^']*') )''')
    return [string[m.span()[0]:m.span()[1]] for m in PATTERN.finditer(string)]

CodePudding user response:

string = 'hello-world:la\test'
    
char = ':'
    
print(string.find(char))

Prints

11

char_index = string.find(char)

string[:char_index]

Returns

'hello-world'

string[char_index 1:]

Returns

'la\test'

CodePudding user response:

Solution for the case you're likely encountering (a pseudo-CSV format you're hand-rolling a parser for; if you're not in that situation, it's still a likely situation for people finding this question later):

Just use the csv module.

import csv
import io

test_strings = ['field1:field2:field3', 'field1:"field2:with:embedded:colons":field3']

for s in test_strings:
    for row in csv.reader(io.StringIO(s), delimiter=':'):
        print(row)

Try it online!

which outputs:

['field1', 'field2', 'field3']
['field1', 'field2:with:embedded:colons', 'field3']

correctly ignoring the colons within the quoted field, requiring no kludgy, hard-to-verify hand-written regexes.

  • Related