Home > database >  How to convert csv file to python dictionary?
How to convert csv file to python dictionary?

Time:11-10

I have a huge csv file containing information about COVID-19 cases and deaths for every single county in the United States.

To give you a general idea of the information contained in this file, here are the first 10 lines of it:

date,county,state,fips,cases,deaths,confirmed_cases,confirmed_deaths,probable_cases,probable_deaths
2021-03-18,Autauga,Alabama,01001,6483,96,5557,85,926,11
2021-03-18,Baldwin,Alabama,01003,20263,295,14329,220,5934,75
2021-03-18,Barbour,Alabama,01005,2199,54,1225,37,974,17
2021-03-18,Bibb,Alabama,01007,2512,58,2031,35,481,23
2021-03-18,Blount,Alabama,01009,6371,129,4901,109,1470,20
2021-03-18,Bullock,Alabama,01011,1193,39,1059,29,134,10
2021-03-18,Butler,Alabama,01013,2069,66,1888,60,181,6
2021-03-18,Calhoun,Alabama,01015,14137,301,10608,242,3529,59
2021-03-18,Chambers,Alabama,01017,3460,113,1720,73,1740,40

Anyways, I want to create a Python dictionary for this data with each of the keys being a tuple consisting of the state and county names and the each of the values being a list of integers: the first int representing the number of confirmed cases and the second one representing the number of confirmed deaths.

Basically, I want output like this:

dic = {("state","county"):[confirmed_cases, confirmed_deaths]}

Please make sure to exclude the header.

How would I generate a python dictionary like the one above for all the counties in the csv file? Please use csv.reader.

Additionally, I need to find the sum of all the confirmed deaths for a particular state. How would I, for example, sum up the values in 'confirmed deaths' for all of the rows where 'state' is 'Alabama'?

EDIT: I came up with a solution for the first part of the problem:

mydict = {}

with open(file_path, mode='r') as inp:
    reader = csv.reader(inp)
    next(reader,None)
    mydict = {tuple(row[1:3]):list(row[6:8]) for row in reader}

return mydict

Can you help me figure out how to sum up confirmed deaths in a certain state based on this dictionary?

CodePudding user response:

I think pandas is the most appropriate solution:

import pandas as pd
df = pd.read_csv(file_path)
dict = df.set_index(['county','state'])[['confirmed_cases', 'confirmed_deaths']].apply(tuple, axis = 1).to_dict()
print(dict)

EDIT

for the sum part:

sum = df.groupby(['state'], as_index=False)['confirmed_cases', 'confirmed_deaths'].sum()
print(sum)

CodePudding user response:

I would actually do it a different way, albeit a bit more verbose but more readable by someone looking at the code.

import csv
from collections import namedtuple

County = namedtuple("County", ["name", "cases", "deaths"])
reader = csv.DictReader(data)

for row in reader:
    state = row["state"]
    county = row["county"]
    record = County(county, int(row["confirmed_cases"]), int(row["confirmed_deaths"]))
    if state in states:
        states[state].append(record)
    else:
        states[state] = [record]

{'Alabama': [County(name='Autauga', cases=5557, deaths=85),
  County(name='Baldwin', cases=14329, deaths=220),
  County(name='Barbour', cases=1225, deaths=37),
  County(name='Bibb', cases=2031, deaths=35),
  County(name='Blount', cases=4901, deaths=109),
  County(name='Bullock', cases=1059, deaths=29),
  County(name='Butler', cases=1888, deaths=60),
  County(name='Calhoun', cases=10608, deaths=242),
  County(name='Chambers', cases=1720, deaths=73)]}

sum(county.deaths for county in states["Alabama"])
>> 890

It will be easier to manage your code if you keep the key simple, in this case just the state. This will also be quicker if your data is larger since we won't have to iterate over tuple keys in the dictionary to grab the state we want.

CodePudding user response:

Try:

import csv

mydict = dict()
with open("test.csv") as inp:
    reader = csv.reader(inp)
    next(reader, None) #skip header
    mydict = {tuple(row[2:0:-1]): list(map(int, row[6:8])) for row in reader}

#total of all confirmed deaths in Alabama
>>> sum(v[1] for k, v in mydict.items() if k[0]=="Alabama")
890
test.csv:
date,county,state,fips,cases,deaths,confirmed_cases,confirmed_deaths,probable_cases,probable_deaths
2021-03-18,Autauga,Alabama,01001,6483,96,5557,85,926,11
2021-03-18,Baldwin,Alabama,01003,20263,295,14329,220,5934,75
2021-03-18,Barbour,Alabama,01005,2199,54,1225,37,974,17
2021-03-18,Bibb,Alabama,01007,2512,58,2031,35,481,23
2021-03-18,Blount,Alabama,01009,6371,129,4901,109,1470,20
2021-03-18,Bullock,Alabama,01011,1193,39,1059,29,134,10
2021-03-18,Butler,Alabama,01013,2069,66,1888,60,181,6
2021-03-18,Calhoun,Alabama,01015,14137,301,10608,242,3529,59
2021-03-18,Chambers,Alabama,01017,3460,113,1720,73,1740,40
  • Related