I have a DataFrame with the following structure:
id year name homepage
238 2022 Adventure {'keywords': 'en', 'genres':[{"revenue": 1463, "name": "culture clash"}], 'runtime': 150, 'vote_average': 7}
But what I need is this structure
id year name keywords revenue name runtime vote_average
238 2022 Adventure en 1460 culture clash 150 7
How can I do this?
CodePudding user response:
assuming you always want the first element of the list in homepage['genres'] and that homepage values are dicts, you can get the first element and then json normalize and concat that:
df['homepage'] = [{k:v[0] if isinstance(v, list) else v for k,v in d.items()} for d in df['homepage']]
pd.concat([df[['id', 'year', 'name']], pd.json_normalize(df['homepage'].values)], axis=1)
CodePudding user response:
The idea is to json_normalize
"homepage" column and join
it back to df
. You can pass meta and the record path directly into json_normalize
as parameters:
out = (df.join(pd.json_normalize(df['homepage'], record_path='genres',
meta=['keywords', 'runtime', 'vote_average']),
lsuffix='', rsuffix='_genres')
.drop(columns='homepage'))
Output:
id year name keywords revenue name_genres runtime vote_average
0 238 2022 Adventure en 1463 culture clash 150 7