read a csv with multiple values in one cell-CodePudding

I have this csv:

Type,ID,Value1,Value2,Name,Text
TypeA,1231,"value1,value2,value3","value7,value8, value9",name1,
TypeA,2123,,,name2,textA
TypeA,4242,,,name3,
TypeA,5135,,,name4,
TypeA,2123,,,name5,
TypeA,7525,,,name6,
TypeA,6869,value4,,name7,
TypeB,9654,"value5, value6",,name8,textB
TypeB,3225,,,name9,
TypeB,6545,,value10,name10,

how do I make it into a dictionary with some lists if there is more than one value? i've tried this:

with open(csv_file,'r') as f:
    csv_list = [[val.strip() for val in r.split(",")] for r in f.readlines()]

(_, *header), *data = csv_list
print(csv_list)
csv_dict = {}
for row in data:
    key, *values = row

    if key not in csv_dict:
        csv_dict[key] = []

    csv_dict[key].append({key: value for key, value in zip(header, values)})

For example, I want csv_dict['TypeB'][0] to print :

{'ID': '9654', 'Value1': ["value5, value6"], 'Value2': [], 'Name': 'name8', 'Text': 'textB'}

But it prints:

{'ID': '9654', 'Value1': '"value5', 'Value2': 'value6"', 'Name': '', 'Text': 'name8'}

CodePudding user response：

Try:

import csv

with open("your_file.csv", "r") as f_in:
    reader = csv.DictReader(f_in)
    data = list(reader)

    for row in data:
        row["Value1"] = [
            ss for s in row["Value1"].split(",") if (ss := s.strip())
        ]
        row["Value2"] = [
            ss for s in row["Value2"].split(",") if (ss := s.strip())
        ]


print(data)

Prints:

[
    {
        "Type": "TypeA",
        "ID": "1231",
        "Value1": ["value1", "value2", "value3"],
        "Value2": ["value7", "value8", "value9"],
        "Name": "name1",
        "Text": "",
    },
    {
        "Type": "TypeA",
        "ID": "2123",
        "Value1": [],
        "Value2": [],
        "Name": "name2",
        "Text": "textA",
    },
    {
        "Type": "TypeA",
        "ID": "4242",
        "Value1": [],
        "Value2": [],
        "Name": "name3",
        "Text": "",
    },

...and so on.

CodePudding user response：

Read your file using csv.DictReader instead of manually splitting lines on commas. csv.DictReader takes care of commas escaped by quotes.

with open(csv_file, 'r') as f:
    reader = csv.DictReader(f)
    for data in reader:
        print(data)

Which creates dicts for each row in your file, with the fields in quotes read as a single string, like so:

{'Type': 'TypeA', 'ID': '1231', 'Value1': 'value1,value2,value3', 'Value2': 'value7,value8, value9', 'Name': 'name1', 'Text': ''}

Now, since you want your Value1 item to be a list, you could split it by commas if the value contains a comma.

csv_dict = {}
with open(csv_file, 'r') as f:
    reader = csv.DictReader(f)
    for data in reader:
        # Overwrite with split result if data["Value1"] is not an empty string
        # Else, make an empty list
        data["Value1"] = data["Value1"].split(",") if data["Value1"] else []
        data["Value2"] = data["Value2"].split(",") if data["Value2"] else []

        if data["Type"] not in csv_dict:
            csv_dict[data["Type"]] = [data]
        else:
            csv_dict[data["Type"]].append(data)

Now, you have csv_dict["TypeB"][0] is:

{'Type': 'TypeB',
 'ID': '9654',
 'Value1': ['value5', ' value6'],
 'Value2': [],
 'Name': 'name8',
 'Text': 'textB'}

CodePudding user response：

Generally, if you use a simple r.split(",") it doesn't know anything about csv files (such as the fact that cells with multiple commas are marked with ""). Try using the csv library: https://docs.python.org/3/library/csv.html