Home > Enterprise >  Split a string by list of character positions
Split a string by list of character positions

Time:12-21

Suppose you have a string:

text = "coding in python is a lot of fun"

And character positions:

positions = [(0,6),(10,16),(29,32)]

These are intervals, which cover certain words within text, i.e. coding, python and fun, respectively.

Using the character positions, how could you split the text on those words, to get this output:

['coding','in','python','is a lot of','fun']

This is just an example, but it should work for any string and any list of character positions.

I'm not looking for this:

[text[i:j] for i,j in positions]

CodePudding user response:

I'd flatten positions to be [0,6,10,16,29,32] and then do something like

positions.append(-1)
prev_positions = [0]   positions
words = []
for begin, end in zip(prev_positions, positions):
    words.append(text[begin:end])

This exact code produces ['', 'coding', ' in ', 'python', ' is a lot of ', 'fun', ''], so it needs some additional work to strip the whitespace

CodePudding user response:

Below code works as expected

text = "coding in python is a lot of fun"
positions = [(0,6),(10,16),(29,32)]
textList = []
lastIndex = 0
for indexes in positions:
    s = slice(indexes[0], indexes[1])
    if positions.index(indexes) > 0:
        print(lastIndex)
        textList.append(text[lastIndex: indexes[0]])
    textList.append(text[indexes[0]: indexes[1]])
    lastIndex = indexes[1]   1
print(textList)

Output: ['coding', 'in ', 'python', 'is a lot of ', 'fun']

Note: If space are not needed you can trim them

  • Related