Home > database >  Turning a CSV file with a header into a python dictionary
Turning a CSV file with a header into a python dictionary

Time:03-29

Lets say I have the following example csv file

a,b
100,200
400,500

How would I make into a dictionary like below:

{a:[100,400],b:[200,500]}

I am having trouble figuring out how to do it manually before I use a package, so I understand. Any one can help?

some code I tried

with open("fake.csv") as f:
    index= 0
    dictionary = {}
    for line in f:
        words = line.strip()
        words = words.split(",")
        if index >= 1:
            for x in range(len(headers_list)):
               dictionary[headers_list[i]] = words[i]
                # only returns the last element which makes sense
        else:
            headers_list = words
        index  = 1    

CodePudding user response:

At the very least, you should be using the built-in csv package for reading csv files without having to bother with parsing. That said, this first approach is still applicable to your .strip and .split technique:

  1. Initialize a dictionary with the column names as keys and empty lists as values
  2. Read a line from the csv reader
  3. Zip the line's contents with the column names you got in step 1
  4. For each key:value pair in the zip, update the dictionary by appending
with open("test.csv", "r") as file:
    reader = csv.reader(file)
    column_names = next(reader)  # Reads the first line, which contains the header
    data = {col: [] for col in column_names}
    for row in reader:
        for key, value in zip(column_names, row):
            data[key].append(value)

Your issue was that you were using the assignment operator = to overwrite the contents of your dictionary on every iteration. This is why you either want to pre-initialize the dictionary like above, or use a membership check first to test if the key exists in the dictionary, adding it if not:

key = headers_list[i]
if key not in dictionary:
    dictionary[key] = []
dictionary[key].append(words[i])

An even cleaner shortcut is to take advantage of dict.get:

key = headers_list[i]
dictionary[key] = dictionary.get(key, [])   [words[i]]

Another approach would be to take advantage of the csv package by reading each row of the csv file as a dictionary itself:

with open("test.csv", "r") as file:
    reader = csv.DictReader(file)
    data = {}
    for row_dict in reader:
        for key, value in row_dict.items():
            data[key] = data.get(key, [])   [value]

Another standard library package you could use to clean this up further is collections, with defaultdict(list), where you can directly append to the dictionary at a given key without worrying about initializing with an empty list if the key wasn't already there.

CodePudding user response:

To do that just keep the column name and data seperate then iterate the column and add the value for the corresponding index in data, not sure if this work with empty values.

However, I am much sure that going through pandas would be 100% easier, it's a really used library for working with data in external files.

import csv

datas = []
with open('fake.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            cols = row
            line_count  = 1
        else:
            datas.append(row)
            line_count  = 1

dict = {}

for index, col in enumerate(cols): #Iterate through the data with value and indices
  dict[col] = []
  for data in datas: #append a in the current dict key, a new value.
    #if this key doesn't exist, it will create a new one.
    dict[col].append(data[index])

print(dict)
  • Related