I'm trying to figure out APIs and Pandas. So far I have requested the API and added the data into a DataFrame which is all fine but my problem is finding/manipulating the data.
My code is as follows, were the API gives a dictionary with of lists. Each list contains all the information of a particular show. Within these lists there are dictionaries with a few keys, such as 'title', 'genre', etc. So for a given show 'title', I want to print the whole list, i.e. all the information about that show or a specific part of that list maybe.
import requests
import pandas as pd
url = 'https://api.jikan.moe/v4/top/anime'
response = requests.get(url)
data = response.json()
df = pd.DataFrame(data['data'])
Name = input('name: ')
for i in df['title']:
if i == Name:
I know how to find a specific piece of data with indexing, for example:
print(df['title'][0])
But i want to be able to generally search for all show information or potentialy a genre ect. So far i have tried iterating over the titles to find the name of the show, but then how do i print the list of information for that show?
Sorry for it being so long, hopefully its not too confusing. Appreciate all the help i can get, thank you.
CodePudding user response:
JSON does not always translate well to a pd.DataFrame
because of nested dictionaries etc. A neat little trick is the built-in pandas function pd.json_normalize
, which flattens the JSON data. A key which you would have to access through several sub-dictionaries (e.g. data['data'][0]['images']['jpg']['small_image_url']
) becomes the column images.jpg.small_image_url
in the flattened DataFrame. Hope this helps!
import requests
import pandas as pd
url = 'https://api.jikan.moe/v4/top/anime'
response = requests.get(url)
data = response.json()
df = pd.json_normalize(data['data'])
print(df.columns)
# Index(['mal_id', 'url', 'title', 'title_english', 'title_japanese',
# 'title_synonyms', 'type', 'source', 'episodes', 'status', 'airing',
# 'duration', 'rating', 'score', 'scored_by', 'rank', 'popularity',
# 'members', 'favorites', 'synopsis', 'background', 'season', 'year',
# 'producers', 'licensors', 'studios', 'genres', 'explicit_genres',
# 'themes', 'demographics', 'images.jpg.image_url',
# 'images.jpg.small_image_url', 'images.jpg.large_image_url',
# 'images.webp.image_url', 'images.webp.small_image_url',
# 'images.webp.large_image_url', 'trailer.youtube_id', 'trailer.url',
# 'trailer.embed_url', 'trailer.images.image_url',
# 'trailer.images.small_image_url', 'trailer.images.medium_image_url',
# 'trailer.images.large_image_url', 'trailer.images.maximum_image_url',
# 'aired.from', 'aired.to', 'aired.prop.from.day',
# 'aired.prop.from.month', 'aired.prop.from.year', 'aired.prop.to.day',
# 'aired.prop.to.month', 'aired.prop.to.year', 'aired.string',
# 'broadcast.day', 'broadcast.time', 'broadcast.timezone',
# 'broadcast.string'],
# dtype='object')
df.query('title_english == "Gintama Season 2"')
# gives you the requested row
To print any given title in the terminal, you could select some relevant columns (see above) and then print the prettified string output of pandas:
requested_row = df.query(f'title_english == "{Name}"') # e.g. Name = 'Gintama Season 2'
relevant_cols = ['title_english', 'status', 'season', 'year']
print(requested_row[relevant_cols].to_string(index=False))
# e.g.:
# title_english status season year
# Gintama Season 2 Finished Airing spring 2011.0