How to parse a CSV file with different line elements without using an external library?-CodePudding

I'm trying to parse a CSV file in Python; the elements in the file increase after the first line from 6 to 7.

CSV example:

Title,Name,Job,Email,Address,ID
Eng.,"FirstName, LastName",Engineer,[email protected],ACME Company,1234567
Eng.,"FirstName, LastName",Engineer,[email protected],ACME Company,1234567

I need a way to format and present the output into a clean table.

From my understanding, the problem with my code is that starting from the second line, the CSV elements increase from 6 to 7. Thus, it throws the following error.

print(stringFormat.format(item.split(',')[0], item.split(',')[1], item.split(',')[2],
                          item.split(',')[3], item.split(',')[4], item.split(',')[5],))
IndexError: list index out of range

My code:

stringFormat = "{:>10} {:>10} {:>10} {:>10} {:>10}  {:>10}"

with open("the_file", 'r') as file:
     for item in file.readlines():
            print(stringFormat.format(item.split(',')[0], item.split(',')[1],
                                      item.split(',')[2], item.split(',')[3],
                                      item.split(',')[4], item.split(',')[5],
                                      item.split(',')[6]))

CodePudding user response：

You can do this with very simple for loops as shown below. I've added a print statement to show the effects

# 'r' is not needed, it is the default value if omitted
with open("file_name") as infile:
    result = []
    # split the read() into a list of lines
    # I prefer this over readlines() as this removes the EOL character
    # automagically (I mean the `\n` char) 
    for line in infile.read().splitlines():
        # check if line is empty (stripping all spaces)
        if len(line.strip()) == 0: 
            continue
        # another way would be to check for ',' characters
        if ',' not in line:
            continue
        # set some helper variables
        line_result = []
        found_quote = False
        element = ""
        # iterate over the line by character
        for c in line:
            # toggle the found_quote if quote found
            if c == '"':
                found_quote = not found_quote
                continue
            if c == ",":
                if found_quote:
                    element  = c
                else:
                    # append the element to the line_result and reset element
                    line_result.append(element)
                    element = ""
            else:
                # append c to the element
                element  = c
        # append leftover element to the line_result
        line_result.append(element)
        
        # append the line_result to the final result
        result.append(line_result)
        print(len(line_result), line_result)


print('------------------------------------------------------------')
stringFormat = "{:>10} {:>20} {:>20} {:>20} {:>20}  {:>10}"

for line in result:
    print(stringFormat.format(*line))

output

6 ['Title', 'Name', 'Job', 'Email', 'Address', 'ID']
6 ['Eng.', 'FirstName, LastName', 'Engineer', '[email protected]', 'ACME Company', '1234567']
6 ['Eng.', 'FirstName, LastName', 'Engineer', '[email protected]', 'ACME Company', '1234567']
------------------------------------------------------------
     Title                 Name                  Job                Email              Address          ID
      Eng.  FirstName, LastName             Engineer    [email protected]         ACME Company     1234567
      Eng.  FirstName, LastName             Engineer    [email protected]         ACME Company     1234567

CodePudding user response：

You could try something like this. The for loop uses the length of the splitted item, so you can have lines that are variable in length.

stringFormats = ["{:>10}", "{:>10}", "{:>10}", "{:>10}", "{:>10}", "{:>10}", "{:>10}"]

with open("the_file", 'r') as file:
    for item in file.readlines():
        s_item = item.split(',')
        f_item = ''
        for x in range(len(s_item)):
            f_item  = stringFormats[x].format(s_item[x])
        print(f_item)

Of course, you need at least enough stringFormats to match the greatest line length. If you never need to use a different option, then you could just change stringFormat back to a single string instead of looping through it.

stringFormat = "{:>10}"

with open("the_file", 'r') as file:
    for item in file.readlines():
        s_item = item.split(',')
        f_item = ''
        for a_field in s_item:
            f_item  = stringFormat.format(a_field)
        print(f_item)