Home > database >  Split multi-word string from whitespace, numbers
Split multi-word string from whitespace, numbers

Time:06-08

I have a text file that looks like this:

ISS Unassigned Trailer     14    59     0     0     0     0     0      73
No Read                   134   267     0     0     0     0     0     401
Mult Read                  61    93     0     0     0     0     0     154
Closed Bag                  0     0     0     0     0     0     0       0
Divert Out Position         0     0     0     0     0     0     0       0
Sorter Aux Mode      2938     0     0     0     0     0     0    2938

I want to split the words from the whitespace and numbers. I am iterating over the file line by line.

The desired outcome for the first line:

words = ['ISS Unassigned Trailer']     # Words variable
numbers = [14, 59, 0, 0, 0, 0, 0, 73]  # Numbers variable

Don't worry about iterating over the lines of text and creating a list of lists, I have that covered. All I need is a method to reliably separate the text from the numbers.

What I've tried:

words = line[:22].strip()
numbers = line[22:].split()

This works for 99% of cases, but in this case there will be a serious error. Because words is defined by the first 22 characters, it will read the first number of the last row incorrectly.

edit: The iteration will look somewhat like this:

for line in file:
    words = line[:22].strip()
    numbers = line[22:].split()
    list_of_lists.append([words, numbers])

CodePudding user response:

With rsplit:

words, *numbers = line.rsplit(maxsplit=8)
numbers[:] = map(int, numbers)

Try it online!

CodePudding user response:

You could use regex to parse the different parts of the line, first matching all the characters up to the numbers, then all the numbers in the line. For example:

import re

file = open('test2.dat', 'r')
for line in file:
    words = re.match(r'[^\d] (?=\s \d)', line).group(0)
    numbers = list(map(int, re.findall(r'\d ', line)))
    print(words, numbers)
file.close()

Output (for your sample data):

ISS Unassigned Trailer     [14, 59, 0, 0, 0, 0, 0, 73]
No Read                   [134, 267, 0, 0, 0, 0, 0, 401]
Mult Read                  [61, 93, 0, 0, 0, 0, 0, 154]
Closed Bag                  [0, 0, 0, 0, 0, 0, 0, 0]
Divert Out Position         [0, 0, 0, 0, 0, 0, 0, 0]
Sorter Aux Mode      [2938, 0, 0, 0, 0, 0, 0, 2938]

CodePudding user response:

You could use re:

words = re.match('[A-Za-z ] ', line)[0].strip()
numbers = re.findall("\d ", line)

Note that the words line will error out if it doesn't find any words, because it will return None


>>> words = [words]
>>> words
['ISS Unassigned Trailer']
>>> numbers
[14, 59, 0, 0, 0, 0, 0, 73]

CodePudding user response:

if you can read the line like "ISS Unassigned Trailer 14 59 0 0 0 0 0 73", you can check the position on string that a number shows up with isdigit(). You can do something like line[x].isdigit(). This function will return True if this charactere represents a number. So, in the first occurrence of a number in your string, you should split them in two parts. Before the occurrence, you will have the characteres. After the occurrence, you will have the values.

Do something like that:

split_position = 0

for x in line:
    if x.isdigit():
        split_position = line.index(x)
        break

words = line[:split_position].strip()
numbers = line[split_position:].split()
  • Related