Home > Net >  Python split a list into several lists for each new line character
Python split a list into several lists for each new line character

Time:03-08

I have the following list, and I would like to split it into several lists when the element in the list is "\n".

Input:

['chain 2109 chrY 59373566   1266734 1266761 chrX 156040895   1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566   1136192 1136219 chrX 156040895   1086629 1086656 4047064\n','27\n','\n']

expected output:

[
 ['chain 2109 chrY 59373566   1266734 1266761 chrX 156040895   1198245 1198272 20769290', '27'],
 ['chain 2032 chrY 59373566   1136192 1136219 chrX 156040895   1086629 1086656 4047064', '27']
]

I tried stripping the elements with "\n" at the end of them and used and modified the accepted answer from this post:

for i, n in enumerate(lst):
    if n != "\n":
        lst[i] = lst[i].rstrip("\n")

[item.split(",") for item in ','.join(lst).split('\n') if item]

But since I am using a comma instead of a single white space to join and split, I get "" after splitting into several lists. How can I prevent this?

[
 ['chain 2109 chrY 59373566   1266734 1266761 chrX 156040895   1198245 1198272 20769290','27',''],
 ['','chain 2032 chrY 59373566   1136192 1136219 chrX 156040895   1086629 1086656 4047064','27','']
]

CodePudding user response:

This work for you?

list1 = ['chain 2109 chrY 59373566   1266734 1266761 chrX 156040895   1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566   1136192 1136219 chrX 156040895   1086629 1086656 4047064\n','27\n','\n']

list2 = []
tmp = []
for item in list1:
    if item != '\n':
        tmp.append(item.rstrip('\n'))
    else:
        #Note we aren't actually processing this item of the input list, as '\n' by itself is unwanted
        list2.append(tmp)
        tmp = []

CodePudding user response:

I would recommend splitting your list with more_itertools.split_at.

Because your original list ends with the separator, '\n', splitting it will result in the final item of your list being an empty sublist. The if check excludes this.

from more_itertools import split_at

original = [
    'chain 2109 chrY 59373566   1266734 1266761 chrX 156040895   1198245 1198272 20769290\n',
    '27\n',
    '\n',
    'chain 2032 chrY 59373566   1136192 1136219 chrX 156040895   1086629 1086656 4047064\n',
    '27\n',
    '\n'
]

processed = [
    [item.rstrip() for item in sublist]
    for sublist in split_at(original, lambda i: i == '\n')
    if sublist
]

print(processed)

Output (line break added for clarity):

[['chain 2109 chrY 59373566   1266734 1266761 chrX 156040895   1198245 1198272 20769290', '27'],
 ['chain 2032 chrY 59373566   1136192 1136219 chrX 156040895   1086629 1086656 4047064', '27']]
  • Related