Home > Back-end >  How to scrape "markup"-text from html code
How to scrape "markup"-text from html code

Time:09-27

I would like to scrape text from a website using Beautifulsoup.

From the code which can be seen in the screenshot above, I would like to retrieve the text stored under markup="content".

I know how to retrieve all markup-parts using soup.find("div", class_="break-word news_main"). But I don't know how to only access the markup="content"-part.

Help would be very much appreciated :)

(Also, is markup a special term in html like "class" or "id"? What does "markup" mean?)

Picture of html code

CodePudding user response:

soup.find_all(markup="content"), as you can see you can filter by any attribute. See docs for more details. markup is some kind of custom attribute; it's not a valid attribute for div elements but you can technically have any attribute on an element and it will still work, both in HTML and in BeautifulSoup.

CodePudding user response:

You can use soup.find_all(markup='content') or a different syntax is soup.find_all('div', {'markup': 'content'} which will give you all divs that contain the attribute mark up with a value of content.

Markup can mean quite a few things. It's basically a system for stylizing text. You can see more here. In this context it's probably something used to style html on a webpage.

  • Related