Home > Mobile >  Parse HTML to find titles with Python and BeautifulSoup
Parse HTML to find titles with Python and BeautifulSoup

Time:11-29

This is the code I'm currently using...

import requests
from bs4 import BeautifulSoup


headers = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '3600',
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
    }

url = "https://blah.com"
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'html.parser')

titles = soup.select('a.title')
print (titles)

When executing this python script I get a bunch of text coming back that look similar to this...

<a  fill="false" arrow="false" duration="0" followcursor="1" theme="translucent" title-auto-hide="Blah" href="/url/blah/" title="Blah">Blah</a>

I'm trying to parse the data only to show the title Blah. How can I make this happen?

CodePudding user response:

If I understand you correctly you want to get text from the parameter title=:

titles = soup.select("a.title")

for a in titles:
    print(a["title"])

If you want the text inside <a>:

titles = soup.select("a.title")

for a in titles:
    print(a.text)
  • Related