Home > database >  How can we split a string into a list of strings where each substring is exactly 10 characters long?
How can we split a string into a list of strings where each substring is exactly 10 characters long?

Time:08-14

How can we split a string into segments of 10 characters each?

Below is a test-case:

INPUT:
    "".join(10*str(num) for num in range(5))
    "00000000001111111111222222222233333333334444444444"

DESIRED OUTPUT:
    ["0000000000", "1111111111", "2222222222", "3333333333", "4444444444"]

Here is another test case:

INPUT:
    "abcdefghijklmnopqrstuvwxyz"

DESIRED OUTPUT:
    ['abcdefghij', 'klmnopqrst', 'uvwxyz']

Failed Solution 1

sometimes, there is spurious empty string ('') at the end of the list.

s = "".join(10*str(num) for num in range(3))

segments = [s[k*10:(k 1)*10] for k in range(1   len(s)//10)]

print(s)
# prints ['0000000000', '1111111111', '2222222222', '']
# there is an extra empty string at the end of the list   

Failed Solution 2

import re # regular expressions (regex)  
# match any 10 characters 
m = re.match(".{10}/g", s)
print(m) # prints 'None'

There is a question here on stack overflow about how to do it in java-script.
However, my question is about python.

A similarly worded question here on stack overflow had the string segments be variable-length. In my case, the segments are all of constant length.

CodePudding user response:

segments = [s[k*10:(k 1)*10] for k in range(ceil(len(s)/10))]

@I'mahdi's comment is the better version:

segments = [s[k:k 10] for k in range(0, len(s), 10)]

CodePudding user response:

The easiest solution is to use the textwrap library

import textwrap

text = "00000000001111111111222222222233333333334444444444"
print(textwrap.wrap(text, 10))

Output:

['0000000000', '1111111111', '2222222222', '3333333333', '4444444444']

CodePudding user response:

The following answer was originally left as a comment by anubhava

import re # regular expressions
pieces = re.findall('.{1,10}', stryng)

It works quite well.

test_input0   = ""
test_input1   = "abcdefnghijkrlmnopqrstuvwxyz"
test_input2 = "".join(10*str(num) for num in range(3))
test_input3 = "".join(10*str(num) for num in range(4))
test_inputs = [test_input0, test_input1, test_input2, test_input3]

We have the following results:

TEST INPUT TEST OUTPUT
'' []
'abcdefnghijkrlmnopqrstuvwxyz' ['abcdefnghi', 'jkrlmnopqr', 'stuvwxyz']
'000000000011111111112222222222' ['0000000000', '1111111111', '2222222222']
'0000000000111111111122222222223333333333' ['0000000000', '1111111111', '2222222222', '3333333333']

CodePudding user response:

Not by any means the most efficient solution but you can use this

def split_ten_chars(string: str) -> list[str]:
    sub = ''
    res = list()
    for index, char in enumerate(string):
        sub  = char
        if index % 10 == 9:
            res.append(sub)
            sub = ''
    if sub:
        res.append(sub)
    return res

CodePudding user response:

here is a sample with a list comprehension that uses slice notation text[ind:ind n] to cut the input string to chunks of len n:

text = "00000000001111111111222222222233333333334444444444"
n = 10 # len of chunks
chunks = [text[ind:ind   n] for ind in range(0, len(text), n)]
print(chunks)

output is:

['0000000000', '1111111111', '2222222222', '3333333333', '4444444444']
  • Related