Loading and normalizing JSON and creating Pandas DF Not working-CodePudding

After 6 hours trying, I'm posting for help. The json file contains playlists. Each playlist contains n songs. Here's a snippet.

file = {
    "info": {
        "generated_on": "2017-12-03 08:41:42.057563", 
        "slice": "0-999", 
        "version": "v1"
    }, 
    "playlists": [
        {
            "name": "Throwbacks", 
            "collaborative": "false", 
            "pid": 0, 
            "modified_at": 1493424000, 
            "num_tracks": 52, 
            "num_albums": 47, 
            "num_followers": 1, 
            "tracks": [
                {
                    "pos": 0, 
                    "artist_name": "Missy Elliott", 
                    "track_uri": "spotify:track:0UaMYEvWZi0ZqiDOoHU3YI", 
                    "artist_uri": "spotify:artist:2wIVse2owClT7go1WT98tk", 
                    "track_name": "Lose Control (feat. Ciara & Fat Man Scoop)", 
                    "album_uri": "spotify:album:6vV5UrXcfyQD1wu4Qo2I9K", 
                    "duration_ms": 226863, 
                    "album_name": "The Cookbook"
                }, 
                {
                    "pos": 1, 
                    "artist_name": "Britney Spears", 
                    "track_uri": "spotify:track:6I9VzXrHxO9rA9A5euc8Ak", 
                    "artist_uri": "spotify:artist:26dSoYclwsYLMAKD3tpOr4", 
                    "track_name": "Toxic", 
                    "album_uri": "spotify:album:0z7pVBGOD7HCIB7S8eLkLI", 
                    "duration_ms": 198800, 
                    "album_name": "In The Zone"
                }, 
                {
                    "pos": 2, 
                    "artist_name": "Beyonc\u00e9", 
                    "track_uri": "spotify:track:0WqIKmW4BTrj3eJFmnCKMv", 
                    "artist_uri": "spotify:artist:6vWDO969PvNqNYHIOW5v0m", 
                    "track_name": "Crazy In Love", 
                    "album_uri": "spotify:album:25hVFAxTlDvXbx2X2QkUkE", 
                    "duration_ms": 235933, 
                    "album_name": "Dangerously In Love (Alben f\u00fcr die Ewigkeit)"
                }, 
                {
                    "pos": 3, 
                    "artist_name": "Justin Timberlake", 
                    "track_uri": "spotify:track:1AWQoqb9bSvzTjaLralEkT", 
                    "artist_uri": "spotify:artist:31TPClRtHm23RisEBtV3X7", 
                    "track_name": "Rock Your Body", 
                    "album_uri": "spotify:album:6QPkyl04rXwTGlGlcYaRoW", 
                    "duration_ms": 267266, 
                    "album_name": "Justified"
                }
              ]
            }
          ]
        }

I need everything from playlists. This includes the n tracks in each playlist. The rub is that the playlist tracks cannot be separated from the playlist ID. I need this normalized, so that I can export to MySQL because there are tons of these files.

This code gets me part of the way there, but leaves the tracks nested in a column.

    with open(new_file) as f:
        data = json.loads(f.read())
        df_nested_list = pd.json_normalize(data, record_path = 'playlists')
    
df_nested_list

I've tried creating a separate dataframe for tracks, normalizing that, and concatinating that with the rest of the data from playlists. I've alternated between json.load() and json.loads(). I've tried adding a meta argument to the last line of the code. I've read the posts on here and they were a no go. Nothing is working. If I can't keep the playlist info and tracks together, it's not going to work.

Anyone have ideas?

CodePudding user response：

If data is your dictionary from the question you can do:

df = pd.DataFrame(data["playlists"]).explode("tracks")
df = pd.concat([df, df.pop("tracks").apply(pd.Series)], axis=1).reset_index(
    drop=True
)
print(df)

Prints:

         name collaborative  pid  modified_at  num_tracks  num_albums  num_followers  pos        artist_name                             track_uri                             artist_uri                                  track_name                             album_uri  duration_ms                                    album_name
0  Throwbacks         false    0   1493424000          52          47              1    0      Missy Elliott  spotify:track:0UaMYEvWZi0ZqiDOoHU3YI  spotify:artist:2wIVse2owClT7go1WT98tk  Lose Control (feat. Ciara & Fat Man Scoop)  spotify:album:6vV5UrXcfyQD1wu4Qo2I9K       226863                                  The Cookbook
1  Throwbacks         false    0   1493424000          52          47              1    1     Britney Spears  spotify:track:6I9VzXrHxO9rA9A5euc8Ak  spotify:artist:26dSoYclwsYLMAKD3tpOr4                                       Toxic  spotify:album:0z7pVBGOD7HCIB7S8eLkLI       198800                                   In The Zone
2  Throwbacks         false    0   1493424000          52          47              1    2            Beyoncé  spotify:track:0WqIKmW4BTrj3eJFmnCKMv  spotify:artist:6vWDO969PvNqNYHIOW5v0m                               Crazy In Love  spotify:album:25hVFAxTlDvXbx2X2QkUkE       235933  Dangerously In Love (Alben für die Ewigkeit)
3  Throwbacks         false    0   1493424000          52          47              1    3  Justin Timberlake  spotify:track:1AWQoqb9bSvzTjaLralEkT  spotify:artist:31TPClRtHm23RisEBtV3X7                              Rock Your Body  spotify:album:6QPkyl04rXwTGlGlcYaRoW       267266                                     Justified