I am learning a little bit about web scraping and currently i am trying to do a small project. So with this code I am storing the HTML code inside soup
variable.
source=requests.get(URL)
soup=BeautifulSoup(source.text,'html.parser')
The problem is: when I inspect the code inside my browser it looks like this:
<a ...>The Godfather</a>
but when I try to use it in my program only the text inside tag (The Godfather) gets translated to my native language (Кум):
<a ...>Кум</a>
I dont want it to get translated. My browser is completely in English and I have no idea why is this happening. Any help would be much appreciated!
CodePudding user response:
Try to specify Accept-Language
HTTP header in your request:
import requests
from bs4 import BeautifulSoup
url = "https://www.imdb.com/search/title/?groups=top_100&sort=user_rating,desc"
headers = {"Accept-Language": "en-US,en;q=0.5"}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for h3 in soup.select("h3"):
print(h3.get_text(strip=True, separator=" "))
Prints:
1. The Shawshank Redemption (1994)
2. The Godfather (1972)
3. The Dark Knight (2008)
4. The Lord of the Rings: The Return of the King (2003)
5. Schindler's List (1993)
6. The Godfather Part II (1974)
...