Home > Net >  Parsing data from file of 3 columns to store data in dictionary with structure dict[key1][key2]
Parsing data from file of 3 columns to store data in dictionary with structure dict[key1][key2]

Time:10-15

I am trying to solve the task:

  1. Read data from the "Region", "Range" and "Population" columns.
Region   Range   Population
Region1   23       1299
Region1   34        3400
Region2   24        3200
Region2   34        1209
Region2   45        3008
  1. Store the total population per range and region, such that the data_dict has the structure data_dict[region][range], where the values correspond to the population. For example, data_dict["Region1"]["23"] should contain the number of people that are 23 years old in the region "Region1".

I wanted to use nested dictionary like this

{‘Region1:{23:1299} ,’Region1’:{34:3400} ….}

It allows to access to

data_dict[‘Region1’][ ‘34’] = 3400

The code:

with open(self.file_name, encoding='iso-8859-1') as file:
            reader = csv.DictReader(file)
            for line in reader:
                self.data_dict[line['Region']] = {line['Range']:line['Population']}

But it doesn’t work as expected because keys in the dictionary should be unique, when I add data from the file, it overwrites values and only the last one is accessible and it doesn’t put in the dictionary the very top data.

I would appreciate any help. Thank you.

CodePudding user response:

You need to check if the key exists already. If it doesn't exist do what you are doing, otherwise add a key to the existing dictionary.

with open(self.file_name, encoding='iso-8859-1') as file:
    reader = csv.DictReader(file)
    for line in reader:
        if line['Region'] in data_dict.keys():
            self.data_dict[line['Region']][line['Range']] = line['Population']
        else:
            self.data_dict[line['Region']] = {line['Range']:line['Population']}

The final data structure looks more like:

{
    "Region1": {
        "24": 1299,
        "34": 3400
    },
    "Region2": {
        "24": 3200,
        "34": 1209,
        "45": 3008
    }
}

but it allows access in the manner you describe data_dict['Region']['Range'].

  • Related