Home > Software engineering >  Prevent Python CSV to JSON for loop iteration from overwriting previous entry
Prevent Python CSV to JSON for loop iteration from overwriting previous entry

Time:12-24

I have a pretty basic Python For statement that I'm using to try to remap a CSV file into geoJSON format. My script looks like this:

def make_json(csvFilePath, jsonFilePath):

    # create a dictionary
    data = {
            "type": "FeatureCollection",
            "features": []
    }

    feature = {
            "type": "Feature",
            "geometry": {
                    "type": "Point",
                    "coordinates": []
            },
            "properties": {}
    }

    # Open a csv reader called DictReader
    with open(csvFilePath, encoding='utf-8') as csvf:
        csvReader = csv.DictReader(csvf)
    
    # Convert each row into a dictionary
    # and add it to data
        for rows in csvReader:

            feature['geometry']['coordinates'] = [float(rows['s_dec']),float(rows['s_ra'])]
            feature['properties'] = rows
            data['features'].append(feature)

    # Open a json writer, and use the json.dumps()
    # function to dump data
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
        jsonf.write(json.dumps(data, indent=4))

However, this is causing each new row entry to overwrite the previous. My output looks like this:

{
"type": "FeatureCollection",
"features": [
    {
        "type": "Feature",
        "geometry": {
            "type": "Point",
            "coordinates": [
                -67.33190277777777,
                82.68714791666666
            ]
        },
        "properties": {
            "dataproduct_type": "image",
            "s_ra": "82.68714791666666",
            "s_dec": "-67.33190277777777",
            "t_min": "59687.56540044768",
            "t_max": "59687.5702465162",
            "s_region": "POLYGON 82.746588309 -67.328433557 82.78394862 -67.338513769"
        }
    },
    {
        "type": "Feature",
        "geometry": {
            "type": "Point",
            "coordinates": [
                -67.33190277777777,
                82.68714791666666
            ]
        },
        "properties": {
            "dataproduct_type": "image",
            "s_ra": "82.68714791666666",
            "s_dec": "-67.33190277777777",
            "t_min": "59687.56540044768",
            "t_max": "59687.5702465162",
            "s_region": "POLYGON 82.746588309 -67.328433557 82.78394862 -67.338513769"
        }
    }
]}

Any thoughts on what I'm doing wrong here?

CodePudding user response:

I don't have the source file to easily test this code, but I think your issue comes from shallow vs deep copy of your object.

A possible way to ensure a deep copy is passing through a json.loads / json.dumps cycle. It is probably not very efficient, so if execution time is an issue, feel free to find another way to produce a deep copy.

        feature_string = json.dumps(feature)
        for row in csvReader:
            buf = json.loads(feature_string)
            buf["geometry"]["coordinates"] = [
                float(rows["s_dec"]),
                float(rows["s_ra"]),
            ]
            buf["properties"] = row
            data["features"].append(buf)

to save to a file, you can directly use json.dump:

with open("json_output.json", encoding="utf-8", mode="w") as fout:
    json.dump(data, fout, indent=4)

CodePudding user response:

A separate data structure is needed for each record. Otherwise the list of features will contain only repeated reference to the same single dict.

def make_json(csvFilePath, jsonFilePath):

    # create a dictionary
    data = {
            "type": "FeatureCollection",
            "features": []
    }

# -- cut this ---
#    feature = {
#            "type": "Feature",
#            "geometry": {
#                    "type": "Point",
#                    "coordinates": []
#            },
#            "properties": {}
#    }

    # Open a csv reader called DictReader
    with open(csvFilePath, encoding='utf-8') as csvf:
        csvReader = csv.DictReader(csvf)
    
    # Convert each row into a dictionary
    # and add it to data
        for rows in csvReader:

# -- and paste it here --
            feature = {
                ...
            }

            feature['geometry']['coordinates'] = [float(rows['s_dec']),float(rows['s_ra'])]
            feature['properties'] = rows
            data['features'].append(feature)

(The code could be then simplified a little bit)

  • Related