I work with big geojson data (more than 1 Gb) with this structure. Is part of it.
{'type': 'FeatureCollection',
'crs': {'type': 'name', 'properties': {'name': 'EPSG:4326'}},
'features': [{'type': 'Feature',
'properties': {'date_create': '15.03.2008',
'statecd': '06',
'cc_date_approval': None,
'children': None,
'adate': '23.08.2017',
'cc_date_entering': '01.01.2014',
'rifr_cnt': None,
'parcel_build_attrs': None,
'rifr': None,
'sale_date': None,
'area_unit': '055',
'util_code': None,
'util_by_doc': None,
'area_value': 115558.0,
'application_date': None,
'sale': None,
'cad_unit': '383',
'kvartal': '69:3:11',
'parent_id': '69:3:11:248',
'sale_cnt': None,
'sale_doc_date': None,
'date_cost': None,
'category_type': '003008000000',
'rifr_dep': None,
'kvartal_cn': '69:03:0000011',
'parent_cn': '69:03:0000011:248',
'cn': '69:03:0000011:245',
'is_big': False,
'rifr_dep_info': None,
'sale_dep': None,
'sale_dep_uo': None,
'parcel_build': False,
'id': '69:3:11:245',
'address': '',
'area_type': '009',
'parcel_type': 'parcel',
'sale_doc_num': None,
'sale_doc_type': None,
'sale_price': None,
'cad_cost': 139698.06,
'fp': None,
'center': {'x': 33.14727379331379, 'y': 55.87764081906541}},
'geometry': {'type': 'MultiPolygon',
'coordinates': []},
I need to save features 'id' and 'area_value' rename them and delete others so that inside the nested sheet will only this two keys.
And I must save other structures of data, otherwise, the program will not understand them.
I get only retrieve data but I can not rewrite them. I use this method. With pandas I have pd.Dataframe with I know how to filtered and select, but I don't know return or rewrite data.
from pandas.io.json import json_normalize
f = 'data_file_name.json'
with open(f,'r') as dff:
data = json.loads(dff.read())
df = json_normalize(data,record_path=['features'], errors='ignore')
df
Also, I tried to work with ijson. And here I have the same problems
def parse_json(json_filename):
with open(json_filename, 'rb') as input_file:
# load json iteratively
parser = ijson.parse(input_file)
for prefix, event, value in parser:
if prefix == 'features.item.properties.id':
id_val = value
if prefix == 'features.item.properties.area_value':
area_val = value
print(id_val)
# print('prefix={}, event={}, value={}'.format(pref ix, event, value))
if __name__ == '__main__':
parse_json('data_file_name.json')
Thank you for all!
CodePudding user response:
This answer works if you are sure that the data is a GeoJSON and it is structured properly:
For reading GeoJSON data you can use Geopandas
library:
import geopandas as gpd
gdf = gpd.read_file('data_file_name.json')
This will load the GeoJSON file in geopandas GeoDataFrame, which is a pandas data frame with spatial analysis capabilities. You can read more here
After you did your operations on the data, you can export it to GeoJSON:
gdf.to_file('data_file_name.geojson', driver='GeoJSON')
This will preserve the geojson structure. If your further analysis is using other software, you can save it in other spatial formats such as Geopackage, shapefiles and even CSV with WKT format for geometries.