Home > Enterprise >  Scraping HTML site in h3 tags
Scraping HTML site in h3 tags

Time:10-31

import requests
from bs4 import BeautifulSoup

url = 'http://www.columbia.edu/~fdc/sample.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.findAll('h3')
print(items)

I get this conclusion:
[<h3 id="contents">CONTENTS</h3>, <h3 id="basics">1. Creating a Web Page</h3>, <h3 id="syntax">2. HTML Syntax</h3>...
How can I get this output?
[CONTENTS, 1. Creating a Web Page, 2. HTML Syntax...

CodePudding user response:

If you are looking for a list of the text inside the h3 tags you can iterate over all the h3 tags and only save the text.

import requests
from bs4 import BeautifulSoup

url = 'http://www.columbia.edu/~fdc/sample.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
items = [h3.text for h3 in soup.findAll('h3')]
print(items)

The Output:

['CONTENTS', '1. Creating a Web Page', '2. HTML Syntax', '3. Special Characters', '4. Converting Plain Text to HTML', '5. Effects', '6. Lists', '7. Links', '8. Tables', '9. Viewing Your Web Page', '10. Installing Your Web Page on the Internet', '11. Where to go from here', '12. Postscript: Cell Phones']
  • Related