I'm trying to parse a CSV file in Python; the elements in the file increase after the first line from 6 to 7.
CSV example:
Title,Name,Job,Email,Address,ID
Eng.,"FirstName, LastName",Engineer,[email protected],ACME Company,1234567
Eng.,"FirstName, LastName",Engineer,[email protected],ACME Company,1234567
I need a way to format and present the output into a clean table.
From my understanding, the problem with my code is that starting from the second line, the CSV elements increase from 6 to 7. Thus, it throws the following error.
print(stringFormat.format(item.split(',')[0], item.split(',')[1], item.split(',')[2],
item.split(',')[3], item.split(',')[4], item.split(',')[5],))
IndexError: list index out of range
My code:
stringFormat = "{:>10} {:>10} {:>10} {:>10} {:>10} {:>10}"
with open("the_file", 'r') as file:
for item in file.readlines():
print(stringFormat.format(item.split(',')[0], item.split(',')[1],
item.split(',')[2], item.split(',')[3],
item.split(',')[4], item.split(',')[5],
item.split(',')[6]))
CodePudding user response:
You can do this with very simple for loops as shown below. I've added a print statement to show the effects
# 'r' is not needed, it is the default value if omitted
with open("file_name") as infile:
result = []
# split the read() into a list of lines
# I prefer this over readlines() as this removes the EOL character
# automagically (I mean the `\n` char)
for line in infile.read().splitlines():
# check if line is empty (stripping all spaces)
if len(line.strip()) == 0:
continue
# another way would be to check for ',' characters
if ',' not in line:
continue
# set some helper variables
line_result = []
found_quote = False
element = ""
# iterate over the line by character
for c in line:
# toggle the found_quote if quote found
if c == '"':
found_quote = not found_quote
continue
if c == ",":
if found_quote:
element = c
else:
# append the element to the line_result and reset element
line_result.append(element)
element = ""
else:
# append c to the element
element = c
# append leftover element to the line_result
line_result.append(element)
# append the line_result to the final result
result.append(line_result)
print(len(line_result), line_result)
print('------------------------------------------------------------')
stringFormat = "{:>10} {:>20} {:>20} {:>20} {:>20} {:>10}"
for line in result:
print(stringFormat.format(*line))
output
6 ['Title', 'Name', 'Job', 'Email', 'Address', 'ID']
6 ['Eng.', 'FirstName, LastName', 'Engineer', '[email protected]', 'ACME Company', '1234567']
6 ['Eng.', 'FirstName, LastName', 'Engineer', '[email protected]', 'ACME Company', '1234567']
------------------------------------------------------------
Title Name Job Email Address ID
Eng. FirstName, LastName Engineer [email protected] ACME Company 1234567
Eng. FirstName, LastName Engineer [email protected] ACME Company 1234567
CodePudding user response:
You could try something like this. The for loop uses the length of the splitted item, so you can have lines that are variable in length.
stringFormats = ["{:>10}", "{:>10}", "{:>10}", "{:>10}", "{:>10}", "{:>10}", "{:>10}"]
with open("the_file", 'r') as file:
for item in file.readlines():
s_item = item.split(',')
f_item = ''
for x in range(len(s_item)):
f_item = stringFormats[x].format(s_item[x])
print(f_item)
Of course, you need at least enough stringFormats to match the greatest line length. If you never need to use a different option, then you could just change stringFormat back to a single string instead of looping through it.
stringFormat = "{:>10}"
with open("the_file", 'r') as file:
for item in file.readlines():
s_item = item.split(',')
f_item = ''
for a_field in s_item:
f_item = stringFormat.format(a_field)
print(f_item)