Home > Net >  How to parse a CSV file with different line elements without using an external library?
How to parse a CSV file with different line elements without using an external library?

Time:03-18

I'm trying to parse a CSV file in Python; the elements in the file increase after the first line from 6 to 7.

CSV example:

Title,Name,Job,Email,Address,ID
Eng.,"FirstName, LastName",Engineer,[email protected],ACME Company,1234567
Eng.,"FirstName, LastName",Engineer,[email protected],ACME Company,1234567

I need a way to format and present the output into a clean table.

From my understanding, the problem with my code is that starting from the second line, the CSV elements increase from 6 to 7. Thus, it throws the following error.

print(stringFormat.format(item.split(',')[0], item.split(',')[1], item.split(',')[2],
                          item.split(',')[3], item.split(',')[4], item.split(',')[5],))
IndexError: list index out of range

My code:

stringFormat = "{:>10} {:>10} {:>10} {:>10} {:>10}  {:>10}"

with open("the_file", 'r') as file:
     for item in file.readlines():
            print(stringFormat.format(item.split(',')[0], item.split(',')[1],
                                      item.split(',')[2], item.split(',')[3],
                                      item.split(',')[4], item.split(',')[5],
                                      item.split(',')[6]))

CodePudding user response:

You can do this with very simple for loops as shown below. I've added a print statement to show the effects

# 'r' is not needed, it is the default value if omitted
with open("file_name") as infile:
    result = []
    # split the read() into a list of lines
    # I prefer this over readlines() as this removes the EOL character
    # automagically (I mean the `\n` char) 
    for line in infile.read().splitlines():
        # check if line is empty (stripping all spaces)
        if len(line.strip()) == 0: 
            continue
        # another way would be to check for ',' characters
        if ',' not in line:
            continue
        # set some helper variables
        line_result = []
        found_quote = False
        element = ""
        # iterate over the line by character
        for c in line:
            # toggle the found_quote if quote found
            if c == '"':
                found_quote = not found_quote
                continue
            if c == ",":
                if found_quote:
                    element  = c
                else:
                    # append the element to the line_result and reset element
                    line_result.append(element)
                    element = ""
            else:
                # append c to the element
                element  = c
        # append leftover element to the line_result
        line_result.append(element)
        
        # append the line_result to the final result
        result.append(line_result)
        print(len(line_result), line_result)


print('------------------------------------------------------------')
stringFormat = "{:>10} {:>20} {:>20} {:>20} {:>20}  {:>10}"

for line in result:
    print(stringFormat.format(*line))

output

6 ['Title', 'Name', 'Job', 'Email', 'Address', 'ID']
6 ['Eng.', 'FirstName, LastName', 'Engineer', '[email protected]', 'ACME Company', '1234567']
6 ['Eng.', 'FirstName, LastName', 'Engineer', '[email protected]', 'ACME Company', '1234567']
------------------------------------------------------------
     Title                 Name                  Job                Email              Address          ID
      Eng.  FirstName, LastName             Engineer    [email protected]         ACME Company     1234567
      Eng.  FirstName, LastName             Engineer    [email protected]         ACME Company     1234567

CodePudding user response:

You could try something like this. The for loop uses the length of the splitted item, so you can have lines that are variable in length.

stringFormats = ["{:>10}", "{:>10}", "{:>10}", "{:>10}", "{:>10}", "{:>10}", "{:>10}"]

with open("the_file", 'r') as file:
    for item in file.readlines():
        s_item = item.split(',')
        f_item = ''
        for x in range(len(s_item)):
            f_item  = stringFormats[x].format(s_item[x])
        print(f_item)

Of course, you need at least enough stringFormats to match the greatest line length. If you never need to use a different option, then you could just change stringFormat back to a single string instead of looping through it.

stringFormat = "{:>10}"

with open("the_file", 'r') as file:
    for item in file.readlines():
        s_item = item.split(',')
        f_item = ''
        for a_field in s_item:
            f_item  = stringFormat.format(a_field)
        print(f_item)
  • Related