I have a function I'm trying to define. I have it printing each data sequence onto a new line, however can I have it print/filter out specific indices?
Let's say my data sequence is:
ABC//DEF//64/G//HI/55/123/JKL
Can I adjust the function to also remove the indices for the numbered data so that it prints:
['ABC', '', 'DEF', '', '', '', 'G', '', '', 'HI', '', '', 'JKL']
Perhaps not only ignore the specific indices, but rather replace with a whitespace?
Thanks!
Code below:
def split_lines(lines, delimiter):
for line in lines:
tokens = line.split(delimiter)
print(tokens)
CodePudding user response:
To remove the digits you could use:
import re
def split_lines(lines, delimiter, to_remove='[0-9]'):
for line in lines:
tokens = line.split(delimiter)
tokens = [re.sub(to_remove, '', token) for token in tokens]
print(tokens)
or to remove the ones made only of digits:
import re
def split_lines(lines, delimiter, to_remove='^[0-9] $'):
for line in lines:
tokens = line.split(delimiter)
tokens = [re.sub(to_remove, '', token) for token in tokens]
print(tokens)
CodePudding user response:
Simple loops and some accumulation make it possible:
t = "ABC//DEF//64/G//HI/55/123/JK2L"
k = [[]] # start with empty inner list
for l in t:
if l == "/":
k.append([]) # add new inner list
else:
k[-1].append(l) # add to last inner list
# fix lists to strings or empty string
for i,v in enumerate(k):
v = ''.join(v) # combine inner list to string
# store empty if string is all numbers, else store v
k[i] = "" if v.isdigit() else v
print (k)
print ([o for o in k if o]) # remove empty values from list
to get
['ABC', '', 'DEF', '', '', 'G', '', 'HI', '', '', 'JK2L']
['ABC', 'DEF', 'G', 'HI', 'JK2L']