Json in python rename, delete-CodePudding

I work with big geojson data (more than 1 Gb) with this structure. Is part of it.

{'type': 'FeatureCollection',
 'crs': {'type': 'name', 'properties': {'name': 'EPSG:4326'}},
 'features': [{'type': 'Feature',
   'properties': {'date_create': '15.03.2008',
    'statecd': '06',
    'cc_date_approval': None,
    'children': None,
    'adate': '23.08.2017',
    'cc_date_entering': '01.01.2014',
    'rifr_cnt': None,
    'parcel_build_attrs': None,
    'rifr': None,
    'sale_date': None,
    'area_unit': '055',
    'util_code': None,
    'util_by_doc': None,
    'area_value': 115558.0,
    'application_date': None,
    'sale': None,
    'cad_unit': '383',
    'kvartal': '69:3:11',
    'parent_id': '69:3:11:248',
    'sale_cnt': None,
    'sale_doc_date': None,
    'date_cost': None,
    'category_type': '003008000000',
    'rifr_dep': None,
    'kvartal_cn': '69:03:0000011',
    'parent_cn': '69:03:0000011:248',
    'cn': '69:03:0000011:245',
    'is_big': False,
    'rifr_dep_info': None,
    'sale_dep': None,
    'sale_dep_uo': None,
    'parcel_build': False,
    'id': '69:3:11:245',
    'address': '',
    'area_type': '009',
    'parcel_type': 'parcel',
    'sale_doc_num': None,
    'sale_doc_type': None,
    'sale_price': None,
    'cad_cost': 139698.06,
    'fp': None,
    'center': {'x': 33.14727379331379, 'y': 55.87764081906541}},
   'geometry': {'type': 'MultiPolygon',
    'coordinates': []},

I need to save features 'id' and 'area_value' rename them and delete others so that inside the nested sheet will only this two keys.

And I must save other structures of data, otherwise, the program will not understand them.

I get only retrieve data but I can not rewrite them. I use this method. With pandas I have pd.Dataframe with I know how to filtered and select, but I don't know return or rewrite data.

from pandas.io.json import json_normalize

f = 'data_file_name.json'

with open(f,'r') as dff:
    data = json.loads(dff.read())
    
df = json_normalize(data,record_path=['features'], errors='ignore')
df

Also, I tried to work with ijson. And here I have the same problems

def parse_json(json_filename):
    with open(json_filename, 'rb') as input_file:
        # load json iteratively
        parser = ijson.parse(input_file)
        for prefix, event, value in parser:
            if prefix == 'features.item.properties.id':
                id_val = value
            if prefix == 'features.item.properties.area_value':
                area_val = value
print(id_val)

#             print('prefix={}, event={}, value={}'.format(pref ix, event, value))

            
            
if __name__ == '__main__':
    parse_json('data_file_name.json')

Thank you for all!

CodePudding user response：

This answer works if you are sure that the data is a GeoJSON and it is structured properly:

For reading GeoJSON data you can use Geopandas library:

import geopandas as gpd

gdf = gpd.read_file('data_file_name.json')

This will load the GeoJSON file in geopandas GeoDataFrame, which is a pandas data frame with spatial analysis capabilities. You can read more here

After you did your operations on the data, you can export it to GeoJSON:

gdf.to_file('data_file_name.geojson', driver='GeoJSON')

This will preserve the geojson structure. If your further analysis is using other software, you can save it in other spatial formats such as Geopackage, shapefiles and even CSV with WKT format for geometries.