Missing features/groups from json file, Python-CodePudding

I am trying to extract the English and Korean names of each municipality in South Korea from a municipality-level geojson file, and here is my Python code.

import json 
import pandas as pd

Korean_municipalities = json.load(open('skorea-municipalities-2018-geo.json', 'r'))
munic_map_eng = {}

for feature in Korean_municipalities['features']:
        feature['id'] = feature['properties']['name_eng']
        munic_map_eng[feature['properties']['name']] = feature['id']  


df_munic = pd.DataFrame(list(munic_map_eng.items()))

There are 250 municipalities. That is

len(Korean_municipalities['features']) = 250

However, there are only 227 in the data frame df_munic. That is

df_munic.shape = (227,2)

It seems like 23 municipalities are missing in this case. I use the same set of codes on province and sub-municipality level. For the sub-municipality level, the issue is the same: 3504 submunicipalties, but the data frame has only 3142 rows. However, there is no such problem at the province level (17 provinces).

Any idea where things may go wrong?

Thanks!

CodePudding user response：

There must be duplicate feature['properties']['name'] values. You're using this as the dictionary key, and keys must be unique, so you only get one row in the dataframe for each name.

Use a list instead of a dictionary to save them all.

import json 
import pandas as pd

Korean_municipalities = json.load(open('skorea-municipalities-2018-geo.json', 'r'))
munic_list_eng = []

for feature in Korean_municipalities['features']:
    feature['id'] = feature['properties']['name_eng']
    munic_list_eng.append((feature['properties']['name'], feature['id']))

df_munic = pd.DataFrame(munic_list_eng)