How can I keep this function from prodecing a list of lists within a list of lists?-CodePudding

I have this code segment working through a rather unsightly data file which has data that I need inserted in a neat manner.

i.e. the data file will have:

...
...
...
alphabetical text
13 42 54 67
31 12
different alphabetical text
25 41 23 76
98 45 38
...
...
...

and I need it written into a list of lists which reads:

[..., [13, 42, 54, 67, 31, 12], [25, 41, 23, 76, 98, 45, 38] ...]

I currently have this code:

if next_line[0].isalpha == True and line[0] == '1' or line[0] == '2' or line[0] == '3' or line[0] == '4' or line[0] == '5' or line[0] == '6' or line[0] == '7' or line[0] == '8' or line[0] == '9': #pardon my hard coding
    h = line.split()
    self.distances.append(h)
else:
    line_queue = []
    num_list = []
    for j in range(i, len(self.datlines)):
        check_line = self.datlines[j]
        if j != len(self.datlines)-1:
            next_check = self.datlines[j 1]
        if check_line[0] == '1' or check_line[0] == '2' or check_line[0] == '3' or check_line[0] == '4' or check_line[0] == '5' or check_line[0] == '6' or check_line[0] == '7' or check_line[0] == '8' or check_line[0] == '9':
            h = check_line.split()
            line_queue.append(h)
            for s in line_queue:
                if s != ' ' and s != '\n':
                    num_list.append(s)
            self.distances.append(num_list)
        if check_line[0].isalpha() == True:
            break

What it gives me occasionally is a list of a list of a list as such:

[..., [13, 42, 54, 67, 31, 12], [[25, 41, 23, 76, 98, 45, 38]] ...]

I've looked through it over and over again, but I cannot find where it is coming up with the extra list layer.

What exactly here is causing this to happen and how can I fix it?

Thank you so much

CodePudding user response：

You don't need separate line and next_line. Just loop through the lines, concatenating the list of numbers when the line is numeric. When you get to an alphabetical line, append the list to the result list and clear the current list of numbers.

At the end, append the final list of numbers in case there's no alphabetical line at the end.

curlist = []
self.distances = []

for line in self.datlines:
    if line[0].isalpha():
        self.distances.append(curlist)
        curlist = []
    else:
        curlist.extend(line.split())

if curlist:
    self.distances.append(curlist)

CodePudding user response：

Here's another approach using regex to read and capture all the numbers in the file at once:

with open(path_to_data_file, "r") as f:
    lines = re.findall(r"[a-zA-Z ]*\n([ \d]*)", f.read())


cleaned_lines = [[]]
for line in lines:
    if line:
        lines[-1].extend(map(int, line.split()))
    else:
        lines.append([])

With this text file:

alphabetical text
13 42 54 67
31 12
different alphabetical text
25 41 23 76
98 45 38
hello world
532 15 52
5225 321 4789
999 999 999

Output:

[[13, 42, 54, 67, 31, 12],
 [25, 41, 23, 76, 98, 45, 38],
 [532, 15, 52, 5225, 321, 4789, 999, 999, 999]]

I think there's actually a regex pattern that might be able to capture the data into separate lines but I haven't figured it out yet. Right now, the pattern returns results like this:

['13 42 54 67',
 '31 12',
 '',
 '25 41 23 76',
 '98 45 38',
 '',
 '532 15 52',
 '5225 321 4789',
 '999 999 999']

Which I then just split on empty strings.