Home > Net >  How to add data into a json key from a csv file using python
How to add data into a json key from a csv file using python

Time:12-01

I am trying to add data into a json key from a csv file and maintain the original structure as is.. the json file looks like this..

{
  "inputDocuments": {
    "gcsDocuments": {
      "documents": [
        {
          "gcsUri": "gs://test/.PDF",
          "mimeType": "application/pdf"
        }
      ]
    }
  },
  "documentOutputConfig": {
    "gcsOutputConfig": {
      "gcsUri": "gs://test"
    }
  },
  "skipHumanReview": false

The csv file I am trying to load has the following structure.. enter image description here

note that the

mimetype

is not included in the csv file.

I already have code that can do this, however its a bit manual and I am looking for a simpler approach that would just require a csv file with the values and this data will be added into the json structure. The expected outcome should look like this:

{
      "inputDocuments": {
        "gcsDocuments": {
          "documents": [
            {
              "gcsUri": "gs://sampleinvoices/Handwritten/1.pdf",
              "mimeType": "application/pdf"
            },
            {
              "gcsUri": "gs://sampleinvoices/Handwritten/2.pdf",
              "mimeType": "application/pdf"
            }
          ]
        }
      },
      "documentOutputConfig": {
        "gcsOutputConfig": {
          "gcsUri": "gs://test"
        }
      },
      "skipHumanReview": false

The code that I am currently using, which is a bit manual looks like this..

import json

# function to add to JSON
def write_json(new_data, filename='keyvalue.json'):
    with open(filename,'r ') as file:
        # load existing data into a dict.
        file_data = json.load(file)
        # Join new_data with file_data inside documents
        file_data["inputDocuments"]["gcsDocuments"]["documents"].append(new_data)
        # Sets file's current position at offset.
        file.seek(0)
        # convert back to json.
        json.dump(file_data, file, indent = 4)

    # python object to be appended
y = {
          "gcsUri": "gs://test/.PDF",
          "mimeType": "application/pdf"        
    }
    
write_json(y)

CodePudding user response:

I would suggest something like this:

import pandas as pd
import json
from pathlib import Path

df_csv = pd.read_csv("your_data.csv")
json_file = Path("your_data.json")
json_data = json.loads(json_file.read_text())

documents = [
    {
        "gcsUri": cell,
        "mimeType": "application/pdf"
    }
    for cell in df_csv["column_name"]
]
json_data["inputDocuments"]["gcsDocuments"]["documents"] = documents

json_file.write_text(json.dumps(json_data))

Probably you should split this into separate functions, but it should communicate the general idea.

  • Related