There is an html page you need to collect the text in the list, which is contained between the h3 and /h3 tags
<h3 id="basics">1. Creating a Web Page</h3>
<p>
Once you've made your "home page" (index.html) you can add more pages to
your site, and your home page can link to them.
<h3 id="syntax">>2. HTML Syntax</h3>
i dont know how to write a pattern for this, pls help to get values "1. Creating a Web Page" and ">2. HTML Syntax"
CodePudding user response:
you can use library like beautifulsoup for crawling webpages.
import requests
from bs4 import BeautifulSoup
html = requests.get('url to your page')
html.encoding = 'utf-8'
sp = BeautifulSoup(html.text, "html5lib")
# to get all h3 in the page
list_h3 = sp.find_all('h3')
for h3 in list_h3:
print(h3.text)
CodePudding user response:
This should work by eliminating parts of the actual tags
html="<h3 id='basics'>1. Creating a Web Page</h3>"
text=html.replace("<h3","").split(">")[1].split("</")[0]