Home > Mobile >  Python Reading csv file with last column having variable number of values
Python Reading csv file with last column having variable number of values

Time:10-30

I have a csv file with the following data:

name, postcode, meals
John, 27133, breakfast
Mary, 90356, lunch, supper
David, 95221, breakfast, lunch, supper

How could I read each row as a dictionary with the last field being a list:

d_john = {
    'name': 'John',
    'postcode': 27133,
    'meals': ['breakfast']
}

d_mary = {
    'name': 'Mary',
    'postcode': 90356,
    'meals': ['lunch', 'supper'],
}

d_david = {
    'name': 'David',
    'postcode': 95221,
    'meals': ['breakfast', 'lunch', 'supper']
}

CodePudding user response:

Use:

# change data.csv to your file_path
with open("data.csv") as infile:
    next(infile)  # skip header
    for line in infile:
        name, postcode, *meals = line.strip().split(", ")
        print({"name": name, "postcode": postcode, "meals": meals})

Output

{'name': 'John', 'postcode': '27133', 'meals': ['breakfast']}
{'name': 'Mary', 'postcode': '90356', 'meals': ['lunch', 'supper']}
{'name': 'David', 'postcode': '95221', 'meals': ['breakfast', 'lunch', 'supper']}

The function next will move the iterator one line, effectively skipping the header. Then use extended iterable unpacking to read each value in the file, after using split for splitting.

A better alternative may be to use csv.DictReader:

import csv

# change data.csv to your file_path
with open("data.csv") as infile:
    reader = csv.DictReader(infile, fieldnames=["name", "postcode"], restkey="meals", skipinitialspace=True)
    next(reader)
    for row in reader:
        print(dict(row))

Output

{'name': 'John', 'postcode': '27133', 'meals': ['breakfast']}
{'name': 'Mary', 'postcode': '90356', 'meals': ['lunch', 'supper']}
{'name': 'David', 'postcode': '95221', 'meals': ['breakfast', 'lunch', 'supper']}

From the documentation (emphasis mine):

The fieldnames parameter is a sequence. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames. Regardless of how the fieldnames are determined, the dictionary preserves their original ordering.

If a row has more fields than fieldnames, the remaining data is put in a list and stored with the fieldname specified by restkey (which defaults to None).

The explanation of skipinitialspace=True, can be found in the Dialects and Formatting Parameters section, quoting for completeness:

When True, whitespace immediately following the delimiter is ignored. The default is False.

  • Related