Home > front end >  Getting data from World Bank API using pandas
Getting data from World Bank API using pandas

Time:01-05

I'm trying to obtain a table of data obtaining just the country, year and value from this World Bank API but I can't seem to filter for just the data I want. I've seen that these types of questions have already been asked but all the answers didn't seem to work.

Would really appreciate some help. Thank you!

import requests
import pandas as pd
from bs4 import BeautifulSoup
import json
url ="http://api.worldbank.org/v2/country/{}/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
country = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
          "GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
          "NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
          "GBR","USA","VNM","ZWE"]

html={}
for i in country:
 url_one = url.format(i)
 html[i] = requests.get(url_one).json()
my_values=[]
for i in country:

  value=html[i][1][0]['value']
  my_values.append(value)

Edit

My data currently looks like this, I'm trying to extract the country name which is in '{'country': {'id': 'AO', 'value': 'Angola''}, the 'date' and the 'value' data

Edit 2 Got the data I'm looking for but its repeated twice each repeated data

CodePudding user response:

Note: Assumed that it would be great to store information for all the years at once and not only for one year - Enables you to simply filter in later processing. Take a look, there is a missing "," between your countries "GRC""HUN"

There are different options to achieve your goal, just point with two of them in the right direction.

Option #1

Pick information needed from json response, create a reshaped dict and append() it to my_values:

for d in data[1]:

    my_values.append({
        'country':d['country']['value'],
        'date':d['date'],
        'value':d['value']
    })

Example

import requests
import pandas as pd


url = 'http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?format=json'
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
          "GRC","HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
          "NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
          "GBR","USA","VNM","ZWE"]
    
my_values = []
for country in countries:
    data = requests.get(url %country).json()

    try:
        for d in data[1]:
            my_values.append({
                'country':d['country']['value'],
                'date':d['date'],
                'value':d['value']
            })
    except Exception as err:
        print(f'[ERROR] country ==> {country} error ==> {err}')

pd.DataFrame(my_values).sort_values(['country', 'date'], ascending=True)

Option #2

Create a dataframes directly from the json response, concat them and make some adjustments on the final dataframe:

for d in data[1]:
    my_values.append(pd.DataFrame(d))

...

pd.concat(my_values).loc[['value']][['country','date','value']].sort_values(['country', 'date'], ascending=True)

Output

country date value
Algeria 1971 341.389
Algeria 1972 442.678
Algeria 1973 554.293
Algeria 1974 818.008
Algeria 1975 936.79
... ... ...
Zimbabwe 2016 1464.59
Zimbabwe 2017 1235.19
Zimbabwe 2018 1254.64
Zimbabwe 2019 1316.74
Zimbabwe 2020 1214.51

CodePudding user response:

Pandas read_json method needs valid JSON str, path object or file-like object, but you put string. https://pandas.pydata.org/docs/reference/api/pandas.read_json.html

Try this:

import requests
import pandas as pd


url = "http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
          "GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
          "NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
          "GBR","USA","VNM","ZWE"]

datas = []
for country in countries:
    data = requests.get(url %country).json()
    try:
        values = data[1][0]
        datas.append(pd.DataFrame(values))
    except Exception as err:
        print(f"[ERROR] country ==> {country} with error ==> {err}")

df = pd.concat(datas)
  •  Tags:  
  • Related