Home > database >  Getting a 'Forbidden' Error when Webscraping using Pandas
Getting a 'Forbidden' Error when Webscraping using Pandas

Time:07-12

How can I work around this 'Forbidden' error when web scraping for data?

table_Populations = pd.read_html("https://www.worldometers.info/world-population/population-by-country/", match = "Countries in the world by population (2022)")
df_Populations = pd.DataFrame(table_Populations[0])
#Change Country or area to country
df_Populations.rename(columns = {'Country (or dependency)' : 'Country'}, inplace = True)

CodePudding user response:

You need to inject user-agent as headers to get rid of forbidden by status 403

import requests
import pandas as pd
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
df = pd.read_html(requests.get('https://www.worldometers.info/world-population/population-by-country/',headers=headers).text)[0]
print(df)

Output:

  # Country (or dependency)  Population (2020)  ... Med. Age  Urban Pop %  World Share
0      1                Honduras            9904607  ...       24         57 %       0.13 %
1      2    United Arab Emirates            9890402  ...       33         86 %       0.13 %
2      3                Djibouti             988000  ...       27         79 %       0.01 %
3      4        Saint Barthelemy               9877  ...     N.A.          0 %       0.00 %
4      5              Seychelles              98347  ...       34         56 %       0.00 %
..   ...                     ...                ...  ...      ...          ...          ...
230  231                  Jordan           10203134  ...       24         91 %       0.13 %
231  232                Portugal           10196709  ...       46         66 %       0.13 %
232  233              Azerbaijan           10139177  ...       32         56 %       0.13 %   
233  234                  Sweden           10099265  ...       41         88 %       0.13 %   
234  235                   India                  0  ...       28         N.A.       0.00 %   

[235 rows x 12 columns]

CodePudding user response:

Your IP may be blocked, due to multiple attempts. Use a proxy/VPN, and the code will work.

  • Related