I'm trying to obtain a table of data obtaining just the country, year and value from this World Bank API but I can't seem to filter for just the data I want. I've seen that these types of questions have already been asked but all the answers didn't seem to work.
Would really appreciate some help. Thank you!
import requests
import pandas as pd
from bs4 import BeautifulSoup
import json
url ="http://api.worldbank.org/v2/country/{}/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
country = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
"GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
"NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
"GBR","USA","VNM","ZWE"]
html={}
for i in country:
url_one = url.format(i)
html[i] = requests.get(url_one).json()
my_values=[]
for i in country:
value=html[i][1][0]['value']
my_values.append(value)
Edit
My data currently looks like this, I'm trying to extract the country name which is in '{'country': {'id': 'AO', 'value': 'Angola''}, the 'date' and the 'value'
Edit 2 Got the data I'm looking for but its repeated twice each
CodePudding user response:
Note: Assumed that it would be great to store information for all the years at once and not only for one year - Enables you to simply filter in later processing. Take a look, there is a missing "," between your countries "GRC""HUN"
There are different options to achieve your goal, just point with two of them in the right direction.
Option #1
Pick information needed from json response, create a reshaped dict and append()
it to my_values
:
for d in data[1]:
my_values.append({
'country':d['country']['value'],
'date':d['date'],
'value':d['value']
})
Example
import requests
import pandas as pd
url = 'http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?format=json'
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
"GRC","HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
"NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
"GBR","USA","VNM","ZWE"]
my_values = []
for country in countries:
data = requests.get(url %country).json()
try:
for d in data[1]:
my_values.append({
'country':d['country']['value'],
'date':d['date'],
'value':d['value']
})
except Exception as err:
print(f'[ERROR] country ==> {country} error ==> {err}')
pd.DataFrame(my_values).sort_values(['country', 'date'], ascending=True)
Option #2
Create a dataframes directly from the json response, concat them and make some adjustments on the final dataframe:
for d in data[1]:
my_values.append(pd.DataFrame(d))
...
pd.concat(my_values).loc[['value']][['country','date','value']].sort_values(['country', 'date'], ascending=True)
Output
country | date | value |
---|---|---|
Algeria | 1971 | 341.389 |
Algeria | 1972 | 442.678 |
Algeria | 1973 | 554.293 |
Algeria | 1974 | 818.008 |
Algeria | 1975 | 936.79 |
... | ... | ... |
Zimbabwe | 2016 | 1464.59 |
Zimbabwe | 2017 | 1235.19 |
Zimbabwe | 2018 | 1254.64 |
Zimbabwe | 2019 | 1316.74 |
Zimbabwe | 2020 | 1214.51 |
CodePudding user response:
Pandas read_json
method needs valid JSON str, path object or file-like object, but you put string.
https://pandas.pydata.org/docs/reference/api/pandas.read_json.html
Try this:
import requests
import pandas as pd
url = "http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
"GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
"NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
"GBR","USA","VNM","ZWE"]
datas = []
for country in countries:
data = requests.get(url %country).json()
try:
values = data[1][0]
datas.append(pd.DataFrame(values))
except Exception as err:
print(f"[ERROR] country ==> {country} with error ==> {err}")
df = pd.concat(datas)