Home > Software engineering >  How to extract the data in between 2 header tags?
How to extract the data in between 2 header tags?

Time:02-01

I'm working on scraping the website and I want to extract the data in between the 2 headers and tag it to first tag as key-value pair.

How to extract the text under headers (like h1 and h2) ?

soup = BeautifulSoup(page.content, 'html.parser')
items = soup.select("div.conWrap")

htag_count = []
item_header = soup.find_all(re.compile('^h[1-6]'))
for item in item_header:
    htag_count.append({item.name:item.text})

print(htag_count)

CodePudding user response:

This won't work if the h_ tags don't share a direct sampop

You can also take a look at this solution if you're interested in nesting subsection into the parent sections.

  • Related