Home > Software engineering >  Cleaning and Extracting JSON From Pandas columns
Cleaning and Extracting JSON From Pandas columns

Time:03-02

I have a DataFrame with the following structure:

id    year  name        homepage        
238   2022  Adventure  {'keywords': 'en', 'genres':[{"revenue": 1463, "name": "culture clash"}], 'runtime': 150, 'vote_average': 7}

But what I need is this structure

   id    year   name        keywords    revenue     name               runtime vote_average
   238   2022  Adventure     en         1460        culture clash      150     7

How can I do this?

CodePudding user response:

assuming you always want the first element of the list in homepage['genres'] and that homepage values are dicts, you can get the first element and then json normalize and concat that:

df['homepage'] = [{k:v[0] if isinstance(v, list) else v for k,v in d.items()} for d in df['homepage']]
pd.concat([df[['id', 'year', 'name']], pd.json_normalize(df['homepage'].values)], axis=1)

CodePudding user response:

The idea is to json_normalize "homepage" column and join it back to df. You can pass meta and the record path directly into json_normalize as parameters:

out = (df.join(pd.json_normalize(df['homepage'], record_path='genres', 
                                 meta=['keywords', 'runtime', 'vote_average']), 
               lsuffix='', rsuffix='_genres')
       .drop(columns='homepage'))

Output:

    id  year       name keywords  revenue    name_genres runtime vote_average
0  238  2022  Adventure       en     1463  culture clash     150            7
  • Related