I have the following list, and I would like to split it into several lists when the element in the list is "\n".
Input:
['chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064\n','27\n','\n']
expected output:
[
['chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290', '27'],
['chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064', '27']
]
I tried stripping the elements with "\n" at the end of them and used and modified the accepted answer from this post:
for i, n in enumerate(lst):
if n != "\n":
lst[i] = lst[i].rstrip("\n")
[item.split(",") for item in ','.join(lst).split('\n') if item]
But since I am using a comma instead of a single white space to join and split, I get "" after splitting into several lists. How can I prevent this?
[
['chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290','27',''],
['','chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064','27','']
]
CodePudding user response:
This work for you?
list1 = ['chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064\n','27\n','\n']
list2 = []
tmp = []
for item in list1:
if item != '\n':
tmp.append(item.rstrip('\n'))
else:
#Note we aren't actually processing this item of the input list, as '\n' by itself is unwanted
list2.append(tmp)
tmp = []
CodePudding user response:
I would recommend splitting your list with more_itertools.split_at
.
Because your original list ends with the separator, '\n'
, splitting it will result in the final item of your list being an empty sublist. The if
check excludes this.
from more_itertools import split_at
original = [
'chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290\n',
'27\n',
'\n',
'chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064\n',
'27\n',
'\n'
]
processed = [
[item.rstrip() for item in sublist]
for sublist in split_at(original, lambda i: i == '\n')
if sublist
]
print(processed)
Output (line break added for clarity):
[['chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290', '27'],
['chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064', '27']]