Home > Mobile >  Beautiful Soup 4 doesn't update the full soup object when new contents added to tag
Beautiful Soup 4 doesn't update the full soup object when new contents added to tag

Time:07-20

So I have a python program that opens a html table template using BS4, as follows: Note: The HTML code is being used in a system which doesn't require it to have , tags etc., Only requires the actual table code

src = r"template.html"
dst = r"A1.html"
copy(src, dst)
with open(dst) as f:
    content = f.read()
    soup = BeautifulSoup(content, 'html.parser')

I've already written a bunch of functions that read some values from another file and change the table based on those.

I now have a

tag where I want to insert a table into it, the tag is formatted as follows:

<tr>
      <td rowspan="5">
        <p >Noise</p>
        <p >Insert Plot</p>
      </td>
...

I want to add the plot into where it says "Insert Plot" My Current code is this:

attachment_html = """
    <ac:link>
            <ri:attachment ri:filename={0}/>
            <ac:plain-text-link-body>
              <![CDATA[{1}]]>
            </ac:plain-text-link-body>
    </ac:link>
    """
soup.find("p", {"class": "pnplot"}).contents = [attachment_html.format(pn_filename, "see PN plot")]
print(soup.find("p", {"class": "pnplot"}).contents)
#above prints: ['\n    <ac:link>\n            <ri:attachment #ri:filename=2.56Ghz_PhaseNoise.png/>\n            <ac:plain-text-link-body>\n              #<![CDATA[see PN plot]]>\n            </ac:plain-text-link-body>\n    </ac:link>\n  
#  ']
print(soup)

So I can add the new string into the contents of the p tag. But then when I print(soup), it does NOT show the updated tag contents. It gives an empty tag:

<p ></p>

I have tried various ways of changing the contents with .contents, .string, replace_with(), but none of these methods actually change the tag properly. I've also tried adding a regular string (eg. "hello") instead of adding the attachment_html string. This gives the same issue

How would I add in the string I want to add while also having it update in the soup object?

Edit:

soup.find("p", {"class": "pnplot"}).contents =[attachment_html.format(pn_filename, "see PN plot")]

With above code, the string is added into the soup object, but it replaces the "<" and ">" in the string with "<" and ">" I need the < and > to remain in the string as they are XHTML based tags which will display an attachment link.

CodePudding user response:

Try to create new soup from attachment_html and assign it to the contents:

from bs4 import BeautifulSoup, CData

html_doc = """\
<tr>
      <td rowspan="5">
        <p >Noise</p>
        <p >Insert Plot</p>
      </td>
</tr>"""

attachment_html = """\
<ac:link>
        <ri:attachment ri:filename="2.56Ghz_PhaseNoise.png"/>
        <ac:plain-text-link-body>
            Here will go CDATA
        </ac:plain-text-link-body>
</ac:link>"""


soup = BeautifulSoup(html_doc, "lxml")
soup2 = BeautifulSoup(attachment_html, "lxml")

body = soup2.find("ac:plain-text-link-body")
body.string.replace_with(CData("see PN plot"))

soup.find("p", {"class": "pnplot"}).string.replace_with(soup2.find("ac:link"))
print(soup.tr.prettify())

Prints:

<tr>
 <td rowspan="5">
  <p >
   Noise
  </p>
  <p >
   <ac:link>
    <ri:attachment ri:filename="2.56Ghz_PhaseNoise.png">
    </ri:attachment>
    <ac:plain-text-link-body>
     <![CDATA[see PN plot]]>
    </ac:plain-text-link-body>
   </ac:link>
  </p>
 </td>
</tr>

CodePudding user response:

My solution was to add the attachment_html directly in the template html file and then edit its attributes to make it display properly:

<p >
          <ac:link>
                  <ri:attachment ri:filename="" />
                  <ac:plain-text-link-body>
                    <![CDATA[see PN plot]]>
                  </ac:plain-text-link-body>
          </ac:link>
      </p>
print(soup.find("ri:attachment", {"class": "pnplot"})["ri:filename"])
    soup.find("ri:attachment", {"class": "pnplot"})["ri:filename"] = pn_filename
    print(soup.find("ri:attachment", {"class": "pnplot"})["ri:filename"])

This results in the tags saving/showing properly in the output html file

  • Related