I have an HTML tag like the following:
print(tag)
<td style="background-color: #e5e5e5;">
<p style="
margin: 0;
font-size: 12px;
line-height: 16px;
font-family: Arial, sans-serif;
text-align: center;
">10</p>
</td>
I can update the value from 10 to 15 in the tag with BeautifulSoup:
tag.p.contents[0].replaceWith(str(15))
However, I haven't figured out a way to update the values in the style tags; because they seem to be a part of their parent, or 'base' tags.
For example, how would I update the tag to the following?
print(tag) -->
<td style="background-color: #762157;">
<p style="
margin: 0;
font-size: 12px;
line-height: 17px;
font-family: Arial, sans-serif;
text-align: center;
">10</p>
</td>
I change the background-color
to #762157
and line-height
to 17px;
CodePudding user response:
In the context of the html you posted, style
is not a tag, but an attribute of the p
tag. Here is one way to modify that p
attribute (and you can apply the same for td
style attribute):
from bs4 import BeautifulSoup as bs
html = '''
<td style="background-color: #e5e5e5;">
<p style="
margin: 0;
font-size: 12px;
line-height: 16px;
font-family: Arial, sans-serif;
text-align: center;
">10</p>
</td>
'''
soup = bs(html, 'html.parser')
print('OLD SOUP')
print(soup.prettify())
print('______________')
p_style_attribute = soup.select_one('p').get('style')
new_p_style_attr = '''
margin: 3;
font-size: 17px;
line-height: 26px;
font-family: Arial, sans-serif;
text-align: center;
'''
soup.select_one('p')['style'] = new_p_style_attr
print('NEW SOUP')
print(soup.prettify())
This will print out in terminal:
OLD SOUP
<td style="background-color: #e5e5e5;">
<p style="
margin: 0;
font-size: 12px;
line-height: 16px;
font-family: Arial, sans-serif;
text-align: center;
">
10
</p>
</td>
______________
NEW SOUP
<td style="background-color: #e5e5e5;">
<p style="
margin: 3;
font-size: 17px;
line-height: 26px;
font-family: Arial, sans-serif;
text-align: center;
">
10
</p>
</td>
Here is the documentation for BeautifulSoup:
https://beautiful-soup-4.readthedocs.io/en/latest/index.html
CodePudding user response:
A regex
approach. Collect the new information about the style into a dictionary, (tag_name, attr_value)
-pairs and pass it to update_style
function for an in-place modification of the soup
.
The substitution is invariant wrt white spaces and ;
. A **subs
could also be possible.
from bs4 import BeautifulSoup as bs
import re
def update_style(soup, subs):
for tag_name, attr in subs.items():
tag = soup.find(tag_name)
k, v = attr.split(':')
tag['style'] = re.sub(rf'{k}:(. );', f'{k}: {v.strip(" ;")};', tag['style'])
html = '''
<td style="background-color: #e5e5e5;">
<p style="
margin: 0;
font-size: 12px;
line-height: 16px;
font-family: Arial, sans-serif;
text-align: center;
">10</p>
</td>
'''
soup = bs(html, 'lxml')
subs = {'td': 'background-color: #762157;', 'p': 'line-height: 17px;'}
update_style(soup, subs)
print(soup.prettify())