Home > Net >  How extract text html tag keeping its order
How extract text html tag keeping its order

Time:12-01

I want to process text with html tags in a string

Consider the string

str = "before <b>This text is bold</b> after. <i>italic</i>"

To give more context I use a PIL ImageDraw object to write a wrapped text with a specified width. Part of the code looks as follows

  rect = Rectangle(x,y,width,height)
  curx = rect.x
  cury = rect.y
  for word in allWords:
    wordWidth, wordHight = font.getsize(word   " ")
    if (curx   wordWidth > rect.x   rect.width):
      cury  = line_height
      curx = rect.x
    draw.text((curx, cury), word, ImageColor.getcolor(hex, "RGB"), font=font)
    curx  = wordWidth

Surely the string str can vary. Moreover using beatifulsoups previousSibling and afterSibling is difficult since the string can be vary.

how would I handle this to use a the proper font with the right text style?

CodePudding user response:

use beautifulsoup children

from bs4 import BeautifulSoup
data="before <b>This text is bold</b> after. <i>italic</i>"
soup=BeautifulSoup(data, 'lxml')
for child in soup.p.children:
    print(child)
>>> before 
>>> <b>This text is bold</b>
>>>  after. 
>>> <i>italic</i>
  • Related