So I have a python program that opens a html table template using BS4, as follows: Note: The HTML code is being used in a system which doesn't require it to have , tags etc., Only requires the actual table code
src = r"template.html"
dst = r"A1.html"
copy(src, dst)
with open(dst) as f:
content = f.read()
soup = BeautifulSoup(content, 'html.parser')
I've already written a bunch of functions that read some values from another file and change the table based on those.
I now have a
tag where I want to insert a table into it, the tag is formatted as follows:
<tr>
<td rowspan="5">
<p >Noise</p>
<p >Insert Plot</p>
</td>
...
I want to add the plot into where it says "Insert Plot" My Current code is this:
attachment_html = """
<ac:link>
<ri:attachment ri:filename={0}/>
<ac:plain-text-link-body>
<![CDATA[{1}]]>
</ac:plain-text-link-body>
</ac:link>
"""
soup.find("p", {"class": "pnplot"}).contents = [attachment_html.format(pn_filename, "see PN plot")]
print(soup.find("p", {"class": "pnplot"}).contents)
#above prints: ['\n <ac:link>\n <ri:attachment #ri:filename=2.56Ghz_PhaseNoise.png/>\n <ac:plain-text-link-body>\n #<![CDATA[see PN plot]]>\n </ac:plain-text-link-body>\n </ac:link>\n
# ']
print(soup)
So I can add the new string into the contents of the p tag. But then when I print(soup), it does NOT show the updated tag contents. It gives an empty tag:
<p ></p>
I have tried various ways of changing the contents with .contents, .string, replace_with(), but none of these methods actually change the tag properly. I've also tried adding a regular string (eg. "hello") instead of adding the attachment_html string. This gives the same issue
How would I add in the string I want to add while also having it update in the soup object?
Edit:
soup.find("p", {"class": "pnplot"}).contents =[attachment_html.format(pn_filename, "see PN plot")]
With above code, the string is added into the soup object, but it replaces the "<" and ">" in the string with "<" and ">" I need the < and > to remain in the string as they are XHTML based tags which will display an attachment link.
CodePudding user response:
Try to create new soup from attachment_html
and assign it to the contents:
from bs4 import BeautifulSoup, CData
html_doc = """\
<tr>
<td rowspan="5">
<p >Noise</p>
<p >Insert Plot</p>
</td>
</tr>"""
attachment_html = """\
<ac:link>
<ri:attachment ri:filename="2.56Ghz_PhaseNoise.png"/>
<ac:plain-text-link-body>
Here will go CDATA
</ac:plain-text-link-body>
</ac:link>"""
soup = BeautifulSoup(html_doc, "lxml")
soup2 = BeautifulSoup(attachment_html, "lxml")
body = soup2.find("ac:plain-text-link-body")
body.string.replace_with(CData("see PN plot"))
soup.find("p", {"class": "pnplot"}).string.replace_with(soup2.find("ac:link"))
print(soup.tr.prettify())
Prints:
<tr>
<td rowspan="5">
<p >
Noise
</p>
<p >
<ac:link>
<ri:attachment ri:filename="2.56Ghz_PhaseNoise.png">
</ri:attachment>
<ac:plain-text-link-body>
<![CDATA[see PN plot]]>
</ac:plain-text-link-body>
</ac:link>
</p>
</td>
</tr>
CodePudding user response:
My solution was to add the attachment_html directly in the template html file and then edit its attributes to make it display properly:
<p >
<ac:link>
<ri:attachment ri:filename="" />
<ac:plain-text-link-body>
<![CDATA[see PN plot]]>
</ac:plain-text-link-body>
</ac:link>
</p>
print(soup.find("ri:attachment", {"class": "pnplot"})["ri:filename"])
soup.find("ri:attachment", {"class": "pnplot"})["ri:filename"] = pn_filename
print(soup.find("ri:attachment", {"class": "pnplot"})["ri:filename"])
This results in the tags saving/showing properly in the output html file