How can I work around this 'Forbidden' error when web scraping for data?
table_Populations = pd.read_html("https://www.worldometers.info/world-population/population-by-country/", match = "Countries in the world by population (2022)")
df_Populations = pd.DataFrame(table_Populations[0])
#Change Country or area to country
df_Populations.rename(columns = {'Country (or dependency)' : 'Country'}, inplace = True)
CodePudding user response:
You need to inject user-agent as headers to get rid of forbidden by status 403
import requests
import pandas as pd
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
df = pd.read_html(requests.get('https://www.worldometers.info/world-population/population-by-country/',headers=headers).text)[0]
print(df)
Output:
# Country (or dependency) Population (2020) ... Med. Age Urban Pop % World Share
0 1 Honduras 9904607 ... 24 57 % 0.13 %
1 2 United Arab Emirates 9890402 ... 33 86 % 0.13 %
2 3 Djibouti 988000 ... 27 79 % 0.01 %
3 4 Saint Barthelemy 9877 ... N.A. 0 % 0.00 %
4 5 Seychelles 98347 ... 34 56 % 0.00 %
.. ... ... ... ... ... ... ...
230 231 Jordan 10203134 ... 24 91 % 0.13 %
231 232 Portugal 10196709 ... 46 66 % 0.13 %
232 233 Azerbaijan 10139177 ... 32 56 % 0.13 %
233 234 Sweden 10099265 ... 41 88 % 0.13 %
234 235 India 0 ... 28 N.A. 0.00 %
[235 rows x 12 columns]
CodePudding user response:
Your IP may be blocked, due to multiple attempts. Use a proxy/VPN, and the code will work.