I want to make a dataframe with all these elements as columns with the following order:
from 'playlists': 'name', 'collaborative', 'pid', 'modified_at', 'num_tracks', 'num_albums', 'num_followers', 'num_edits', 'duration_ms', 'num_artists'
from 'tracks': 'pos', 'artist_name', 'track_uri', 'artist_uri', 'track_name', 'album_uri', 'duration_ms', 'album_name'
from 'info': 'generated_on', 'slice', 'version'
A part of the json file is the following:
{
"info": {
"generated_on": "2017-12-03 08:41:42.057563",
"slice": "0-999",
"version": "v1"
},
"playlists": [
{
"name": "Throwbacks",
"collaborative": "false",
"pid": 0,
"modified_at": 1493424000,
"num_tracks": 52,
"num_albums": 47,
"num_followers": 1,
"tracks": [
{
"pos": 0,
"artist_name": "Missy Elliott",
"track_uri": "spotify:track:0UaMYEvWZi0ZqiDOoHU3YI",
"artist_uri": "spotify:artist:2wIVse2owClT7go1WT98tk",
"track_name": "Lose Control (feat. Ciara & Fat Man Scoop)",
"album_uri": "spotify:album:6vV5UrXcfyQD1wu4Qo2I9K",
"duration_ms": 226863,
"album_name": "The Cookbook"
},
{
"pos": 1,
"artist_name": "Britney Spears",
"track_uri": "spotify:track:6I9VzXrHxO9rA9A5euc8Ak",
"artist_uri": "spotify:artist:26dSoYclwsYLMAKD3tpOr4",
"track_name": "Toxic",
"album_uri": "spotify:album:0z7pVBGOD7HCIB7S8eLkLI",
"duration_ms": 198800,
"album_name": "In The Zone"
},
],
"num_edits": 6,
"duration_ms": 11532414,
"num_artists": 37
},
When I run the program it gives the following error:
TypeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_3208/3949258436.py in 16 17 ---> 18 data= pd.json_normalize(js['playlists'], ['name', 'collaborative', 'pid', 'modified_at', 'num_tracks', 'num_albums', 19 'num_followers', 'tracks', 'num_edits', 'num_artists'], js['info']) 20
~\anaconda3\lib\site-packages\pandas\io\json_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level) 293 294 meta_vals: DefaultDict = defaultdict(list) --> 295 meta_keys = [sep.join(val) for val in _meta] 296 297 def _recursive_extract(data, path, seen_meta, level=0):
~\anaconda3\lib\site-packages\pandas\io\json_normalize.py in (.0) 293 294 meta_vals: DefaultDict = defaultdict(list) --> 295 meta_keys = [sep.join(val) for val in _meta] 296 297 def _recursive_extract(data, path, seen_meta, level=0):
TypeError: sequence item 0: expected str instance, dict found
Here is my code:
import json
import pandas as pd
import os
path = 'C:\\Users\\sotir\\Desktop\\machinedataset'
filenames = os.listdir(path)
for filename in sorted(filenames):
if filename.startswith("mpd.slice.") and filename.endswith(".json"):
fullpath = os.sep.join((path, filename))
f = open(fullpath)
js = json.load(f)
f.close()
data= pd.json_normalize(js['playlists'], ['name', 'collaborative', 'pid', 'modified_at', 'num_tracks', 'num_albums',
'num_followers', 'tracks', 'num_edits', 'num_artists'], js['info'])
CodePudding user response:
To solve your immediate error, in your pd.json.normalize()
call, change the last arg from:
['info']
to:
js['info']
You'll get further errors, but that is fodder for a new question.
CodePudding user response:
Answer for Revised Question:
According to https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.html, the third arg to pd.json_normalize()
is meta=None
Looking at that doc page (which you definitely need to review), we see:
pandas.json_normalize
pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]
Normalize semi-structured JSON data into a flat table.
Parameters:
data: dict or list of dicts
Unserialized JSON objects.
record_path: str or list of str, default None
Path in each object to list of records. If not passed, data will be assumed to be an array of records.
meta: list of paths (str or list of str), default None
Fields to use as metadata for each record in resulting table.
According to the docs, meta
is meant to be a string or list of strings. You're passing in a dict, and that is causing the error. You need to read the docs to understand how to make the call to json_normalize()