Home > Enterprise >  403 Forbidden BeautifulSoup Web Scraper
403 Forbidden BeautifulSoup Web Scraper

Time:04-21

I was building a web scraper to pull hrefs off of https://www.startengine.com/explore, but I was struggling to get any hrefs. I decided to print the webpage and figured out why.

Here is my code:

import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re

URL = "https://www.startengine.com/explore"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")

links = []
print(soup)

This is the output:

<html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
</body>
</html>

Can someone help me work around the "403 Forbidden"?

CodePudding user response:

You need to inject your user-agent as header as follows:

import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re

URL = "https://www.startengine.com/explore"
headers={'User-Agent':'mozilla/5.0'}
page = requests.get(URL,headers=headers)
print(page)
soup = BeautifulSoup(page.text, "html.parser")

links = []
print(soup)
  • Related