Home > Software engineering >  How to read a specific table from a given url?
How to read a specific table from a given url?

Time:12-14

I am new to python and trying to download the countries GDP per capita data. I am trying to read the data from this website: https://worldpopulationreview.com/countries/by-gdp

I tried to read the data but, I found no tables found error. I can see the data is in r.text but somehow pandas can not read that table. How to solve the problem and read the data?

MWE

import pandas as pd
import requests

url = "https://worldpopulationreview.com/countries/by-gdp"

r = requests.get(url)
raw_html = r.text  # I can see the data is here, but pd.read_html says no tables found
df_list = pd.read_html(raw_html)
print(len(df_list))

CodePudding user response:

Data is embedded via <script id="__NEXT_DATA__" type="application/json"> and rendered by browser only, so you have to adjust your script a bit:

pd.json_normalize(
    json.loads(
        BeautifulSoup(
            requests.get(url).text
        ).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)

Example

import pandas as pd
import requests,json
from bs4 import BeautifulSoup

url = "https://worldpopulationreview.com/countries/by-gdp"


df = pd.json_normalize(
    json.loads(
        BeautifulSoup(
            requests.get(url).text
        ).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
df[['continent', 'country', 'pop','imfGDP', 'unGDP', 'gdpPerCapita']]

Output

continent country pop imfGDP unGDP gdpPerCapita
0 North America United States 338290 2.08938e 13 18624475000000 61762.9
1 Asia China 1.42589e 06 1.48626e 13 11218281029298 10423.4
... ... ... ... ... ... ...
210 Asia Syria 22125.2 0 22163075121 1001.71
211 North America Turks and Caicos Islands 45.703 0 917550492 20076.4
  • Related