Splitting a string in sub-strings python-CodePudding

There are any efficient way to split a sequence like this not using [:] slicing?

GATAAG  G  ATAAG
        GA  TAAG
        GAT  AAG
        GATA  AG
        GATAA  G

I found something in itertools, but not do it right:

def subslices(seq):
    "Return all contiguous non-empty subslices of a sequence"
    # subslices('ABCD') --> A AB ABC ABCD B BC BCD C CD D
    slices = itertools.starmap(slice, itertools.combinations(range(len(seq)   1), 2))
    return map(operator.getitem, itertools.repeat(seq), slices)

list(subslices(s))
['G', 'GA', 'GAT', 'GATA', 'GATAA', 'GATAAG', 'A', 'AT', 'ATA', 'ATAA', 'ATAAG', 'T', 'TA', 'TAA', 'TAAG', 'A', 'AA', 'AAG', 'A', 'AG', 'G']

And also Not readable. Other solution:

def splitting_kmer(s):
    n = len(s)
    print(n)
    for i, _ in enumerate(s, 1):
        if i == n:
            break
        print(s[:n-i], s[n-i:])

Paulo

CodePudding user response：

A simple and efficient way to get all unique substrings of a string:

sample = 'GATAAG'

slices = set(sample[i:j] for i in range(len(sample)) for j in range(i 1, len(sample)))

print(slices)

Result:

{'AA', 'AT', 'GATA', 'A', 'GATAA', 'G', 'GA', 'TA', 'T', 'ATA', 'TAA', 'ATAA', 'GAT'}

They are in random order because it's a set (which is unordered by definition), and they're in a set to ensure there are no duplicates. If you want duplicates and order:

sample = 'GATAAG'

slices = [sample[i:j] for i in range(len(sample)) for j in range(i 1, len(sample))]

print(slices)

Result:

['G', 'GA', 'GAT', 'GATA', 'GATAA', 'A', 'AT', 'ATA', 'ATAA', 'T', 'TA', 'TAA', 'A', 'AA', 'A']