Home > Enterprise >  How to extract data from a Dictionary within Pandas Dataframe
How to extract data from a Dictionary within Pandas Dataframe

Time:06-10

I'm trying to figure out APIs and Pandas. So far I have requested the API and added the data into a DataFrame which is all fine but my problem is finding/manipulating the data.

My code is as follows, were the API gives a dictionary with of lists. Each list contains all the information of a particular show. Within these lists there are dictionaries with a few keys, such as 'title', 'genre', etc. So for a given show 'title', I want to print the whole list, i.e. all the information about that show or a specific part of that list maybe.

import requests
import pandas as pd

url = 'https://api.jikan.moe/v4/top/anime'
response = requests.get(url)
data = response.json()
df = pd.DataFrame(data['data'])

Name = input('name: ')

for i in df['title']:
    if i == Name:

I know how to find a specific piece of data with indexing, for example:

print(df['title'][0])

But i want to be able to generally search for all show information or potentialy a genre ect. So far i have tried iterating over the titles to find the name of the show, but then how do i print the list of information for that show?

Sorry for it being so long, hopefully its not too confusing. Appreciate all the help i can get, thank you.

CodePudding user response:

JSON does not always translate well to a pd.DataFrame because of nested dictionaries etc. A neat little trick is the built-in pandas function pd.json_normalize, which flattens the JSON data. A key which you would have to access through several sub-dictionaries (e.g. data['data'][0]['images']['jpg']['small_image_url']) becomes the column images.jpg.small_image_url in the flattened DataFrame. Hope this helps!

import requests
import pandas as pd

url = 'https://api.jikan.moe/v4/top/anime'
response = requests.get(url)
data = response.json()
df = pd.json_normalize(data['data'])

print(df.columns)
# Index(['mal_id', 'url', 'title', 'title_english', 'title_japanese',
#        'title_synonyms', 'type', 'source', 'episodes', 'status', 'airing',
#        'duration', 'rating', 'score', 'scored_by', 'rank', 'popularity',
#        'members', 'favorites', 'synopsis', 'background', 'season', 'year',
#        'producers', 'licensors', 'studios', 'genres', 'explicit_genres',
#        'themes', 'demographics', 'images.jpg.image_url',
#        'images.jpg.small_image_url', 'images.jpg.large_image_url',
#        'images.webp.image_url', 'images.webp.small_image_url',
#        'images.webp.large_image_url', 'trailer.youtube_id', 'trailer.url',
#        'trailer.embed_url', 'trailer.images.image_url',
#        'trailer.images.small_image_url', 'trailer.images.medium_image_url',
#        'trailer.images.large_image_url', 'trailer.images.maximum_image_url',
#        'aired.from', 'aired.to', 'aired.prop.from.day',
#        'aired.prop.from.month', 'aired.prop.from.year', 'aired.prop.to.day',
#        'aired.prop.to.month', 'aired.prop.to.year', 'aired.string',
#        'broadcast.day', 'broadcast.time', 'broadcast.timezone',
#        'broadcast.string'],
#       dtype='object')

df.query('title_english == "Gintama Season 2"')
# gives you the requested row

To print any given title in the terminal, you could select some relevant columns (see above) and then print the prettified string output of pandas:

requested_row = df.query(f'title_english == "{Name}"')  # e.g. Name = 'Gintama Season 2'
relevant_cols = ['title_english', 'status', 'season', 'year']
print(requested_row[relevant_cols].to_string(index=False))
# e.g.:
#     title_english           status  season    year
#  Gintama Season 2  Finished Airing  spring  2011.0

  • Related