I am trying to split a txt file with multiple lines into separate variables. The text is an output of volume information with names, data sizes, etc. and I wan to split each dataset into a specific variable but can't seem to get it
Example is trying to split this data set into a variable for each item
/vol0 abcd4 Object RAID6 228.33 GB -- 400.00 GB Online
/vole1 abcd1 Object RAID6 44.19 TB 45.00 TB 45.00 TB Online
/vole2 abcd4 Object RAID6 11.27 TB 11.00 TB 12.00 TB Online
/vol3 abcd4 Object RAID6 9.50 TB -- 10.00 TB Online
/vol4 abcd1 Object RAID6 18.39 TB -- 19.10 TB Online
This is the command I've run, but I keep getting an error about "not enough values to unpack
".
inputfile = "dataset_input.txt"
with open(inputfile, "r") as input:
for row in input:
vol, bs, obj, raid, used, uunit, quota, qunit, q2, q2unit, status = row.split()
I can split the file by space just by doing the below text and it works. Just can't seem to get it into separate variables so I can manipulate the datasets
for row in input: #running through each row in the file
output_text = row.split() #split the row based on the default white-space delimiter
print(output_text)
I'm very new to python, so not sure if this is even possible, or how complicated it is
CodePudding user response:
Firstly what you done is call split method which would split your rows into a list for every single space present in the string. That would provide you a list with a bigger length than the number of variables you have defined. This can only be solved by splitting into the correct number of variables you need.
Secondly in every for loop the same variable would be rewritten with new values thus losing the previous iteration value you can solve this by having the values appended into respective variable arrays
Here is a simple solution in which you first read the entire text file contents , preprocess it and store the processed content into required variable lists
fle=open("dataset_input.txt",'r')
txt=fle.readlines()
#adding another newline for patter homogenity
txt[-1] ='\n'
n=len(txt)
#remove new lines
for i in range(0,n):txt[i]=txt[i][0:-1]
#trim multi spaces to #
import re
for i in range(0,n):
txt[i]=re.sub('\s{2,}','#',txt[i])
txt[i]=txt[i].split('#')
#define required variables
x1=[]
x2=[]
x3=[]
x4=[]
x5=[]
x6=[]
x7=[]
#adding the variable values to respective variables
for i in txt:
x1.append(i[0])
x2.append(i[1])
x3.append(i[2])
x4.append(i[3])
x5.append(i[4])
x6.append(i[5])
x7.append(i[6])
print(x1,x2,x3,x4,x5,x6,x7)
Also note that it is possible to improve the code by combining the list appending in pre process stage itself depending on your life requirement of the main text file contents
CodePudding user response:
the error not enough values to unpack
is produced when executing this line of code : vol, bs, obj, raid, used, uunit, quota, qunit, q2, q2unit, status = row.split()
.
the reason is that you are reading 11 separate elements from each row, though looking at the example you show, not every row contains 11 words separated by space.
check this out :
with open(inputfile, "r") as input:
for row in input:
output = row.split()
print("this row provides {} arguments".format(len(output)))
print(output)
the output :
this row provides 10 arguments
['/vol0', 'abcd4', 'Object', 'RAID6 ', '228.33', 'GB', '--', '400.00', 'GB', 'Online']
this row provides 11 arguments
['/vole1', 'abcd1', 'Object', 'RAID6 ', '44.19', 'TB', '45.00', 'TB', '45.00', 'TB', 'Online']
this row provides 11 arguments
['/vole2', 'abcd4', 'Object', 'RAID6 ', '11.27', 'TB', '11.00', 'TB', '12.00', 'TB', 'Online']
this row provides 10 arguments
['/vol3', 'abcd4', 'Object', 'RAID6 ', '9.50', 'TB', '--', '10.00', 'TB', 'Online']
this row provides 10 arguments
['/vol4', 'abcd1', 'Object', 'RAID6 ', '18.39', 'TB', '--', '19.10', 'TB', 'Online']
you need then to make some cleaning for you data-set, or maybe an if statement on the length would be helpful. looking at only the small portion of the data you provided i see that the mark "--" means that there is no volume. so you can replace the "--" mark with a couple of meaningful variables (value unit) for example 0 and any unit. This is how you might do it:
with open(inputfile, "r") as input:
for row in input:
output = str(row).replace("--","0 0").split()
print("this row provides {} arguments".format(len(output)))
print(output)
and this would be the output
this row provides 11 arguments
['/vol0', 'abcd4', 'Object', 'RAID6 ', '228.33', 'GB', '0', '0', '400.00', 'GB', 'Online']
this row provides 11 arguments
['/vole1', 'abcd1', 'Object', 'RAID6 ', '44.19', 'TB', '45.00', 'TB', '45.00', 'TB', 'Online']
this row provides 11 arguments
['/vole2', 'abcd4', 'Object', 'RAID6 ', '11.27', 'TB', '11.00', 'TB', '12.00', 'TB', 'Online']
this row provides 11 arguments
['/vol3', 'abcd4', 'Object', 'RAID6 ', '9.50', 'TB', '0', '0', '10.00', 'TB', 'Online']
this row provides 11 arguments
['/vol4', 'abcd1', 'Object', 'RAID6 ', '18.39', 'TB', '0', '0', '19.10', 'TB', 'Online']
CodePudding user response:
It looks to me like your data is a list of fixed length records and rather than using split()
you might take slices based on your fixed length fields. Ultimatley, I would look at implementing using pythons struct
but this might get you started processing a fixed length record.
Let's start with some example data you read from your file and let's define a list of fixed length field specifications.
data = [
"/vol0 abcd4 Object RAID6 228.33 GB -- 400.00 GB Online",
"/vole1 abcd1 Object RAID6 44.19 TB 45.00 TB 45.00 TB Online",
"/vole2 abcd4 Object RAID6 11.27 TB 11.00 TB 12.00 TB Online",
"/vol3 abcd4 Object RAID6 9.50 TB -- 10.00 TB Online",
"/vol4 abcd1 Object RAID6 18.39 TB -- 19.10 TB Online"
]
##------------------------------
## Only you know for sure what the start and stop is of the fields in this fixed length record.
##------------------------------
fields = [
{"name": "path", "starts_at": 0, "width": 37},
{"name": "abc", "starts_at": 37, "width": 5},
{"name": "type", "starts_at": 47, "width": 13},
{"name": "size", "starts_at": 60, "width": 11},
# ....
]
##------------------------------
Now, given your rows of data and the field definitions we can create a list of lists.
##------------------------------
## reshape as a list of lists
##------------------------------
data2 = [
[
row[field["starts_at"] : field["starts_at"] field["width"]].strip()
for field
in fields
]
for row
in data
]
print(json.dumps(data2, indent=2))
##------------------------------
This should give you:
[
['/vol0', 'abcd4', 'Object RAID6 ', '228.33 GB'],
['/vole1', 'abcd1', 'Object RAID6 ', '44.19 TB'],
['/vole2', 'abcd4', 'Object RAID6 ', '11.27 TB'],
['/vol3', 'abcd4', 'Object RAID6 ', '9.50 TB'],
['/vol4', 'abcd1', 'Object RAID6 ', '18.39 TB']
]
I myself would rather work with a list
of dict
if possible, so given the data and field definitions above, I might use them like this...
##------------------------------
## reshape as a list of dict
##------------------------------
data2 = [
{
field["name"]: row[field["starts_at"] : field["starts_at"] field["width"]].strip()
for field
in fields
}
for row
in data
]
import json # only for printing a nice output
print(json.dumps(data2, indent=2))
##------------------------------
Giving you:
[
{
"path": "/vol0",
"abc": "abcd4",
"type": "Object RAID6 ",
"size": "228.33 GB"
},
{
"path": "/vole1",
"abc": "abcd1",
"type": "Object RAID6 ",
"size": "44.19 TB"
},
{
"path": "/vole2",
"abc": "abcd4",
"type": "Object RAID6 ",
"size": "11.27 TB"
},
{
"path": "/vol3",
"abc": "abcd4",
"type": "Object RAID6 ",
"size": "9.50 TB"
},
{
"path": "/vol4",
"abc": "abcd1",
"type": "Object RAID6 ",
"size": "18.39 TB"
}
]
CodePudding user response:
If you wanted to keep your original approach, something like this will cater for the error of sometimes having only 10 'columns' instead of the expected 11:
with open('dataset_input.txt') as f:
lines = f.readlines()
for line in lines:
line = line.strip().split() # Remove white space and split by space, returns a list
if line[6] == '--':
# This means there is no quota value present
# so insert another -- to correct the length ('columns') of the line to 11
line.insert(6, '--')
vol, bs, obj, raid, used, uunit, quota, qunit, q2, q2unit, status = tuple(line)
# Perform any calculations and prints you want here
# PER LINE (each iteration will overwrite the variables above)
# Note that all variables will be strings. So convert if required.
You can of course change the "--" to anything you want. e.g:
...
line.insert(6, '0')
...
and also change the "--" in the qunit as well if you wish:
...
line[6] = '0'
line.insert(6, '0')
...
On an unrelated side note, you have input
as your file handle in your original code. input
is a Python reserved keyword; these should be avoided when you choose any kind of identifier in your code.