After 6 hours trying, I'm posting for help.
The json file contains playlists. Each playlist contains n
songs.
Here's a snippet.
file = {
"info": {
"generated_on": "2017-12-03 08:41:42.057563",
"slice": "0-999",
"version": "v1"
},
"playlists": [
{
"name": "Throwbacks",
"collaborative": "false",
"pid": 0,
"modified_at": 1493424000,
"num_tracks": 52,
"num_albums": 47,
"num_followers": 1,
"tracks": [
{
"pos": 0,
"artist_name": "Missy Elliott",
"track_uri": "spotify:track:0UaMYEvWZi0ZqiDOoHU3YI",
"artist_uri": "spotify:artist:2wIVse2owClT7go1WT98tk",
"track_name": "Lose Control (feat. Ciara & Fat Man Scoop)",
"album_uri": "spotify:album:6vV5UrXcfyQD1wu4Qo2I9K",
"duration_ms": 226863,
"album_name": "The Cookbook"
},
{
"pos": 1,
"artist_name": "Britney Spears",
"track_uri": "spotify:track:6I9VzXrHxO9rA9A5euc8Ak",
"artist_uri": "spotify:artist:26dSoYclwsYLMAKD3tpOr4",
"track_name": "Toxic",
"album_uri": "spotify:album:0z7pVBGOD7HCIB7S8eLkLI",
"duration_ms": 198800,
"album_name": "In The Zone"
},
{
"pos": 2,
"artist_name": "Beyonc\u00e9",
"track_uri": "spotify:track:0WqIKmW4BTrj3eJFmnCKMv",
"artist_uri": "spotify:artist:6vWDO969PvNqNYHIOW5v0m",
"track_name": "Crazy In Love",
"album_uri": "spotify:album:25hVFAxTlDvXbx2X2QkUkE",
"duration_ms": 235933,
"album_name": "Dangerously In Love (Alben f\u00fcr die Ewigkeit)"
},
{
"pos": 3,
"artist_name": "Justin Timberlake",
"track_uri": "spotify:track:1AWQoqb9bSvzTjaLralEkT",
"artist_uri": "spotify:artist:31TPClRtHm23RisEBtV3X7",
"track_name": "Rock Your Body",
"album_uri": "spotify:album:6QPkyl04rXwTGlGlcYaRoW",
"duration_ms": 267266,
"album_name": "Justified"
}
]
}
]
}
I need everything from playlists
. This includes the n
tracks in each playlist. The rub is that the playlist tracks cannot be separated from the playlist ID. I need this normalized, so that I can export to MySQL because there are tons of these files.
This code gets me part of the way there, but leaves the tracks nested in a column.
with open(new_file) as f:
data = json.loads(f.read())
df_nested_list = pd.json_normalize(data, record_path = 'playlists')
df_nested_list
I've tried creating a separate dataframe for tracks, normalizing that, and concatinating that with the rest of the data from playlists
. I've alternated between json.load()
and json.loads()
. I've tried adding a meta
argument to the last line of the code. I've read the posts on here and they were a no go. Nothing is working. If I can't keep the playlist info and tracks together, it's not going to work.
Anyone have ideas?
CodePudding user response:
If data
is your dictionary from the question you can do:
df = pd.DataFrame(data["playlists"]).explode("tracks")
df = pd.concat([df, df.pop("tracks").apply(pd.Series)], axis=1).reset_index(
drop=True
)
print(df)
Prints:
name collaborative pid modified_at num_tracks num_albums num_followers pos artist_name track_uri artist_uri track_name album_uri duration_ms album_name
0 Throwbacks false 0 1493424000 52 47 1 0 Missy Elliott spotify:track:0UaMYEvWZi0ZqiDOoHU3YI spotify:artist:2wIVse2owClT7go1WT98tk Lose Control (feat. Ciara & Fat Man Scoop) spotify:album:6vV5UrXcfyQD1wu4Qo2I9K 226863 The Cookbook
1 Throwbacks false 0 1493424000 52 47 1 1 Britney Spears spotify:track:6I9VzXrHxO9rA9A5euc8Ak spotify:artist:26dSoYclwsYLMAKD3tpOr4 Toxic spotify:album:0z7pVBGOD7HCIB7S8eLkLI 198800 In The Zone
2 Throwbacks false 0 1493424000 52 47 1 2 Beyoncé spotify:track:0WqIKmW4BTrj3eJFmnCKMv spotify:artist:6vWDO969PvNqNYHIOW5v0m Crazy In Love spotify:album:25hVFAxTlDvXbx2X2QkUkE 235933 Dangerously In Love (Alben für die Ewigkeit)
3 Throwbacks false 0 1493424000 52 47 1 3 Justin Timberlake spotify:track:1AWQoqb9bSvzTjaLralEkT spotify:artist:31TPClRtHm23RisEBtV3X7 Rock Your Body spotify:album:6QPkyl04rXwTGlGlcYaRoW 267266 Justified