I am new to python and trying to download the countries GDP per capita data. I am trying to read the data from this website: https://worldpopulationreview.com/countries/by-gdp

I tried to read the data but, I found no tables found error. I can see the data is in r.text but somehow pandas can not read that table. How to solve the problem and read the data?

MWE

import pandas as pd
import requests

url = "https://worldpopulationreview.com/countries/by-gdp"

r = requests.get(url)
raw_html = r.text  # I can see the data is here, but pd.read_html says no tables found
df_list = pd.read_html(raw_html)
print(len(df_list))

CodePudding user response：

Data is embedded via <script id="__NEXT_DATA__" type="application/json"> and rendered by browser only, so you have to adjust your script a bit:

pd.json_normalize(
    json.loads(
        BeautifulSoup(
            requests.get(url).text
        ).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)

Example

import pandas as pd
import requests,json
from bs4 import BeautifulSoup

url = "https://worldpopulationreview.com/countries/by-gdp"


df = pd.json_normalize(
    json.loads(
        BeautifulSoup(
            requests.get(url).text
        ).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
df[['continent', 'country', 'pop','imfGDP', 'unGDP', 'gdpPerCapita']]

Output

	continent	country	pop	imfGDP	unGDP	gdpPerCapita
0	North America	United States	338290	2.08938e 13	18624475000000	61762.9
1	Asia	China	1.42589e 06	1.48626e 13	11218281029298	10423.4
...	...	...	...	...	...	...
210	Asia	Syria	22125.2	0	22163075121	1001.71
211	North America	Turks and Caicos Islands	45.703	0	917550492	20076.4