I am new to python and trying to download the countries GDP per capita data. I am trying to read the data from this website: https://worldpopulationreview.com/countries/by-gdp
I tried to read the data but, I found no tables found error.
I can see the data is in r.text
but somehow pandas can not read that table.
How to solve the problem and read the data?
MWE
import pandas as pd
import requests
url = "https://worldpopulationreview.com/countries/by-gdp"
r = requests.get(url)
raw_html = r.text # I can see the data is here, but pd.read_html says no tables found
df_list = pd.read_html(raw_html)
print(len(df_list))
CodePudding user response:
Data is embedded via <script id="__NEXT_DATA__" type="application/json">
and rendered by browser only, so you have to adjust your script a bit:
pd.json_normalize(
json.loads(
BeautifulSoup(
requests.get(url).text
).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
Example
import pandas as pd
import requests,json
from bs4 import BeautifulSoup
url = "https://worldpopulationreview.com/countries/by-gdp"
df = pd.json_normalize(
json.loads(
BeautifulSoup(
requests.get(url).text
).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
df[['continent', 'country', 'pop','imfGDP', 'unGDP', 'gdpPerCapita']]
Output
continent | country | pop | imfGDP | unGDP | gdpPerCapita | |
---|---|---|---|---|---|---|
0 | North America | United States | 338290 | 2.08938e 13 | 18624475000000 | 61762.9 |
1 | Asia | China | 1.42589e 06 | 1.48626e 13 | 11218281029298 | 10423.4 |
... | ... | ... | ... | ... | ... | ... |
210 | Asia | Syria | 22125.2 | 0 | 22163075121 | 1001.71 |
211 | North America | Turks and Caicos Islands | 45.703 | 0 | 917550492 | 20076.4 |