I am trying to extract the English and Korean names of each municipality in South Korea from a municipality-level geojson file, and here is my Python code.
import json
import pandas as pd
Korean_municipalities = json.load(open('skorea-municipalities-2018-geo.json', 'r'))
munic_map_eng = {}
for feature in Korean_municipalities['features']:
feature['id'] = feature['properties']['name_eng']
munic_map_eng[feature['properties']['name']] = feature['id']
df_munic = pd.DataFrame(list(munic_map_eng.items()))
There are 250 municipalities. That is
len(Korean_municipalities['features']) = 250
However, there are only 227 in the data frame df_munic. That is
df_munic.shape = (227,2)
It seems like 23 municipalities are missing in this case. I use the same set of codes on province and sub-municipality level. For the sub-municipality level, the issue is the same: 3504 submunicipalties, but the data frame has only 3142 rows. However, there is no such problem at the province level (17 provinces).
Any idea where things may go wrong?
Thanks!
CodePudding user response:
There must be duplicate feature['properties']['name']
values. You're using this as the dictionary key, and keys must be unique, so you only get one row in the dataframe for each name.
Use a list instead of a dictionary to save them all.
import json
import pandas as pd
Korean_municipalities = json.load(open('skorea-municipalities-2018-geo.json', 'r'))
munic_list_eng = []
for feature in Korean_municipalities['features']:
feature['id'] = feature['properties']['name_eng']
munic_list_eng.append((feature['properties']['name'], feature['id']))
df_munic = pd.DataFrame(munic_list_eng)