I have a text file that looks like this:
ISS Unassigned Trailer 14 59 0 0 0 0 0 73
No Read 134 267 0 0 0 0 0 401
Mult Read 61 93 0 0 0 0 0 154
Closed Bag 0 0 0 0 0 0 0 0
Divert Out Position 0 0 0 0 0 0 0 0
Sorter Aux Mode 2938 0 0 0 0 0 0 2938
I want to split the words from the whitespace and numbers. I am iterating over the file line by line.
The desired outcome for the first line:
words = ['ISS Unassigned Trailer'] # Words variable
numbers = [14, 59, 0, 0, 0, 0, 0, 73] # Numbers variable
Don't worry about iterating over the lines of text and creating a list of lists, I have that covered. All I need is a method to reliably separate the text from the numbers.
What I've tried:
words = line[:22].strip()
numbers = line[22:].split()
This works for 99% of cases, but in this case there will be a serious error. Because words is defined by the first 22 characters, it will read the first number of the last row incorrectly.
edit: The iteration will look somewhat like this:
for line in file:
words = line[:22].strip()
numbers = line[22:].split()
list_of_lists.append([words, numbers])
CodePudding user response:
With rsplit
:
words, *numbers = line.rsplit(maxsplit=8)
numbers[:] = map(int, numbers)
CodePudding user response:
You could use regex to parse the different parts of the line, first matching all the characters up to the numbers, then all the numbers in the line. For example:
import re
file = open('test2.dat', 'r')
for line in file:
words = re.match(r'[^\d] (?=\s \d)', line).group(0)
numbers = list(map(int, re.findall(r'\d ', line)))
print(words, numbers)
file.close()
Output (for your sample data):
ISS Unassigned Trailer [14, 59, 0, 0, 0, 0, 0, 73]
No Read [134, 267, 0, 0, 0, 0, 0, 401]
Mult Read [61, 93, 0, 0, 0, 0, 0, 154]
Closed Bag [0, 0, 0, 0, 0, 0, 0, 0]
Divert Out Position [0, 0, 0, 0, 0, 0, 0, 0]
Sorter Aux Mode [2938, 0, 0, 0, 0, 0, 0, 2938]
CodePudding user response:
You could use re:
words = re.match('[A-Za-z ] ', line)[0].strip()
numbers = re.findall("\d ", line)
Note that the words line will error out if it doesn't find any words, because it will return None
>>> words = [words]
>>> words
['ISS Unassigned Trailer']
>>> numbers
[14, 59, 0, 0, 0, 0, 0, 73]
CodePudding user response:
if you can read the line like "ISS Unassigned Trailer 14 59 0 0 0 0 0 73", you can check the position on string that a number shows up with isdigit(). You can do something like line[x].isdigit(). This function will return True if this charactere represents a number. So, in the first occurrence of a number in your string, you should split them in two parts. Before the occurrence, you will have the characteres. After the occurrence, you will have the values.
Do something like that:
split_position = 0
for x in line:
if x.isdigit():
split_position = line.index(x)
break
words = line[:split_position].strip()
numbers = line[split_position:].split()