Home > database >  create a dataframe from multiple JSON file with unique keys
create a dataframe from multiple JSON file with unique keys

Time:08-07

I have a JSON that looks something like this:

translation_map:    
    str_empty:  
        nl: {}
        bn: {}
    str_6df066da34e6:   
        nl: 
            value:  "value 1"
            publishedAt:    16438
            audio:  "value1474.mp3"
        bn: 
            value:  "value2"
            publishedAt:    164322907
    str_9036dfe313457:  
        nl: 
            value:  "value3"
            publishedAt:    1647611912
            audio:  "value3615.mp3"
        bn: 
            value:  "value4"
            publishedAt:    1238641456

I am trying to take some of the fields and put them into a dataframe that I can later export to a CSV, however I am having trouble with the unique keys I have this code which works for one unique value:

import os, json
import pandas as pd

# json files
path_to_json = 'C:\\Users\\bob\\Videos\\Captures'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
print(json_files)

# define my pandas Dataframe columns
jsons_data = pd.DataFrame(columns=['transcription', 'meaning', 'sound'])

for index, js in enumerate(json_files):
    with open(os.path.join(path_to_json, js)) as json_file:
        json_text = json.load(json_file)

        transcription = json_text['translation_map']['str_6df066da34e6']['nl']['value']
        sound = json_text['translation_map']['str_6df066da34e6']['nl']['audio']
        meaning = json_text['translation_map']['str_6df066da34e6']['bn']['value']

        jsons_data.loc[index] = [transcription, meaning, sound]

# look at json data in our DataFrame
print(jsons_data)

However, I am not sure how to loop through the unique values with this.

here is another example of the json that looks more like json, I guess:

    "id": "ob26",
  "class": "objective",
  "type": "objective",
  "content": {
    "title": "s413",
    "image": null,
    "image_svg": "ht.svg",
    "images": {
      "thumbnail_256": "256.jpg"
    },
    "description": "str0",
    "color_1": "FDd0BA",
    "color_2": "FF8240",
    "bucket": 2
  },
"structure": [ a lot of things],
"translation_map": {
        "str_empty": {
          "nl": {},
          "bn": {}
        },
        "str_9asihdu7dcb": {
          "nl": {
            "value": "value2",
            "audio": "8007.mp3"
          },
          "bn": {
            "value": "value4"
          }
        },
        "str_f4c8ashuh524": {
          "nl": {
            "value": "value1",
            "audio": "8026.mp3"
          },
          "bn": {
            "value": "Maet."
          }
        },
        "str_39asjashfk6": {
          "nl": {
            "value": "value5",
            "audio": "40.mp3"
          },
          "bn": {
            "value": "value4"
          }
        },

CodePudding user response:

Use a nested loop and dict.values() like so:

json_text = {
    "translation_map": {
        "str_9asihdu7dcb": {
            "nl": {
                "value": "value2",
                "audio": "8007.mp3"
            },
            "bn": {
                "value": "value4"
            }
        },
        "str_f4c8ashuh524": {
            "nl": {
                "value": "value1",
                "audio": "8026.mp3"
            },
            "bn": {
                "value": "Maet."
            }
        },
        "str_39asjashfk6": {
            "nl": {
                "value": "value5",
                "audio": "40.mp3"
            },
            "bn": {
                "value": "value4"
            }
        }
    }
}

for translation_map in json_text:
    for v in json_text[translation_map].values():
        if v["nl"]:
            transcription = v["nl"]["value"]
            sound = v["nl"]["audio"]
        else:
            transcription = "empty"
            sound = "empty"

        if v["bn"]:
            meaning = v["bn"]["value"]
        else:
            meaning = "empty"

        print(transcription, sound, meaning)

Output

value2 8007.mp3 value4
value1 8026.mp3 Maet.
value5 40.mp3 value4
empty empty empty
  • Related