Key Error with json file when I try to extract specific values-CodePudding

I want to make a dataframe with all these elements as columns with the following order:

from 'playlists': 'name', 'collaborative', 'pid', 'modified_at', 'num_tracks', 'num_albums', 'num_followers', 'num_edits', 'duration_ms', 'num_artists'

from 'tracks': 'pos', 'artist_name', 'track_uri', 'artist_uri', 'track_name', 'album_uri', 'duration_ms', 'album_name'

from 'info': 'generated_on', 'slice', 'version'

A part of the json file is the following:

{
    "info": {
        "generated_on": "2017-12-03 08:41:42.057563", 
        "slice": "0-999", 
        "version": "v1"
    }, 
    "playlists": [
        {
            "name": "Throwbacks", 
            "collaborative": "false", 
            "pid": 0, 
            "modified_at": 1493424000, 
            "num_tracks": 52, 
            "num_albums": 47, 
            "num_followers": 1, 
            "tracks": [
                {
                    "pos": 0, 
                    "artist_name": "Missy Elliott", 
                    "track_uri": "spotify:track:0UaMYEvWZi0ZqiDOoHU3YI", 
                    "artist_uri": "spotify:artist:2wIVse2owClT7go1WT98tk", 
                    "track_name": "Lose Control (feat. Ciara & Fat Man Scoop)", 
                    "album_uri": "spotify:album:6vV5UrXcfyQD1wu4Qo2I9K", 
                    "duration_ms": 226863, 
                    "album_name": "The Cookbook"
                }, 
                {
                    "pos": 1, 
                    "artist_name": "Britney Spears", 
                    "track_uri": "spotify:track:6I9VzXrHxO9rA9A5euc8Ak", 
                    "artist_uri": "spotify:artist:26dSoYclwsYLMAKD3tpOr4", 
                    "track_name": "Toxic", 
                    "album_uri": "spotify:album:0z7pVBGOD7HCIB7S8eLkLI", 
                    "duration_ms": 198800, 
                    "album_name": "In The Zone"
                }, 
 ], 
            "num_edits": 6, 
            "duration_ms": 11532414, 
            "num_artists": 37
        },

When I run the program it gives the following error:

TypeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_3208/3949258436.py in 16 17 ---> 18 data= pd.json_normalize(js['playlists'], ['name', 'collaborative', 'pid', 'modified_at', 'num_tracks', 'num_albums', 19 'num_followers', 'tracks', 'num_edits', 'num_artists'], js['info']) 20

~\anaconda3\lib\site-packages\pandas\io\json_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level) 293 294 meta_vals: DefaultDict = defaultdict(list) --> 295 meta_keys = [sep.join(val) for val in _meta] 296 297 def _recursive_extract(data, path, seen_meta, level=0):

~\anaconda3\lib\site-packages\pandas\io\json_normalize.py in (.0) 293 294 meta_vals: DefaultDict = defaultdict(list) --> 295 meta_keys = [sep.join(val) for val in _meta] 296 297 def _recursive_extract(data, path, seen_meta, level=0):

TypeError: sequence item 0: expected str instance, dict found

Here is my code:

import json
import pandas as pd
import os



path = 'C:\\Users\\sotir\\Desktop\\machinedataset'

filenames = os.listdir(path)
for filename in sorted(filenames):
    if filename.startswith("mpd.slice.") and filename.endswith(".json"):
        fullpath = os.sep.join((path, filename))
        f = open(fullpath)
        js = json.load(f)
        f.close()


data= pd.json_normalize(js['playlists'],  ['name', 'collaborative', 'pid', 'modified_at', 'num_tracks', 'num_albums',
                                                    'num_followers', 'tracks', 'num_edits',  'num_artists'], js['info'])

CodePudding user response：

To solve your immediate error, in your pd.json.normalize() call, change the last arg from:

['info']

to:

js['info']

You'll get further errors, but that is fodder for a new question.

CodePudding user response：

Answer for Revised Question:

According to https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.html, the third arg to pd.json_normalize() is meta=None

Looking at that doc page (which you definitely need to review), we see:

pandas.json_normalize
pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]
Normalize semi-structured JSON data into a flat table.

Parameters:
data: dict or list of dicts
    Unserialized JSON objects.

record_path: str or list of str, default None
    Path in each object to list of records. If not passed, data will be assumed to be an array of records.

meta: list of paths (str or list of str), default None
    Fields to use as metadata for each record in resulting table.

According to the docs, meta is meant to be a string or list of strings. You're passing in a dict, and that is causing the error. You need to read the docs to understand how to make the call to json_normalize()