I have a csv file passed into a function as a string:
csv_input = """
quiz_date,location,size
2022-01-01,london_uk,134
2022-01-02,edingburgh_uk,65
2022-01-01,madrid_es,124
2022-01-02,london_uk,125
2022-01-01,edinburgh_uk,89
2022-01-02,madric_es,143
2022-01-02,london_uk,352
2022-01-01,edinburgh_uk,125
2022-01-01,madrid_es,431
2022-01-02,london_uk,151"""
I want to print the sum of how many people were surveyed in each city by date, so something like:
Date. City. Pop-Surveyed
2022-01-01. London. 134
2022-01-01. Edinburgh. 214
2022-01-01. Madrid. 555
2022-01-02. London. 628
2022-01-02. Edinburgh. 65
2022-01-02. Madrid. 143
As I can't import pandas on my machine (can't install without internet access) I thought I could use a defaultdict to store the value of each city by date
from collections import defaultdict
survery_data = csv_input.split()[1:]
survery_data = [survey.split(',') for survey in survery_data]
survey_sum = defaultdict(dict)
for survey in survery_data:
date = survey[0]
city = survey[1].split("_")[0]
quantity = survey[-1]
survey_sum[date][city] = quantity
print(survey_sum)
But doing this returns a KeyError:
KeyError: 'london'
When I was hoping to have a defaultdict of
{'2022-01-01': {'london': 134}, {'edinburgh': 214}, {'madrid': 555}},
{'2022-01-02': {'london': 628}, {'edinburgh': 65}, {'madrid': 143}}
Is there a way to create a default dict that gives a structure so I could then iterate over to print out each column like above?
CodePudding user response:
Try:
csv_input = """\
quiz_date,location,size
2022-01-01,london_uk,134
2022-01-02,edingburgh_uk,65
2022-01-01,madrid_es,124
2022-01-02,london_uk,125
2022-01-01,edinburgh_uk,89
2022-01-02,madric_es,143
2022-01-02,london_uk,352
2022-01-01,edinburgh_uk,125
2022-01-01,madrid_es,431
2022-01-02,london_uk,151"""
header, *rows = (
tuple(map(str.strip, line.split(",")))
for line in map(str.strip, csv_input.splitlines())
)
tmp = {}
for date, city, size in rows:
key = (date, city.split("_")[0])
tmp[key] = tmp.get(key, 0) int(size)
out = {}
for (date, city), size in tmp.items():
out.setdefault(date, []).append({city: size})
print(out)
Prints:
{
"2022-01-01": [{"london": 134}, {"madrid": 555}, {"edinburgh": 214}],
"2022-01-02": [{"edingburgh": 65}, {"london": 628}, {"madric": 143}],
}
CodePudding user response:
Changing
survey_sum = defaultdict(dict)
to
survey_sum = defaultdict(lambda: defaultdict(int))
allows the return of
defaultdict(<function survey_sum.<locals>.<lambda> at 0x100edd8b0>, {'2022-01-01': defaultdict(<class 'int'>, {'london': 134, 'madrid': 555, 'edinburgh': 214}), '2022-01-02': defaultdict(<class 'int'>, {'edingburgh': 65, 'london': 628, 'madrid': 143})})
Allowing iterating over to create a list.