Home > Net >  How do I edit attritubes of HTML tags using Beautiful Soup
How do I edit attritubes of HTML tags using Beautiful Soup

Time:09-11

I have an HTML tag like the following:

print(tag)

<td style="background-color: #e5e5e5;">
  <p style="
     margin: 0; 
     font-size: 12px;
     line-height: 16px;
     font-family: Arial, sans-serif;
     text-align: center;
     ">10</p>
</td>

I can update the value from 10 to 15 in the tag with BeautifulSoup:

tag.p.contents[0].replaceWith(str(15))

However, I haven't figured out a way to update the values in the style tags; because they seem to be a part of their parent, or 'base' tags.

For example, how would I update the tag to the following?

print(tag) --> 
<td style="background-color: #762157;">
   <p style="
      margin: 0;
      font-size: 12px;
      line-height: 17px;
      font-family: Arial, sans-serif;
      text-align: center;
      ">10</p>
</td>

I change the background-color to #762157 and line-height to 17px;

CodePudding user response:

In the context of the html you posted, style is not a tag, but an attribute of the p tag. Here is one way to modify that p attribute (and you can apply the same for td style attribute):

from bs4 import BeautifulSoup as bs

html = '''
<td style="background-color: #e5e5e5;">
  <p style="
     margin: 0; 
     font-size: 12px;
     line-height: 16px;
     font-family: Arial, sans-serif;
     text-align: center;
     ">10</p>
</td>
'''

soup = bs(html, 'html.parser')
print('OLD SOUP')
print(soup.prettify())
print('______________')
p_style_attribute = soup.select_one('p').get('style')
new_p_style_attr = '''
margin: 3; 
 font-size: 17px;
 line-height: 26px;
 font-family: Arial, sans-serif;
 text-align: center;

'''
soup.select_one('p')['style'] = new_p_style_attr
print('NEW SOUP')
print(soup.prettify())

This will print out in terminal:

OLD SOUP
<td style="background-color: #e5e5e5;">
 <p style="
     margin: 0; 
     font-size: 12px;
     line-height: 16px;
     font-family: Arial, sans-serif;
     text-align: center;
     ">
  10
 </p>
</td>

______________
NEW SOUP
<td style="background-color: #e5e5e5;">
 <p style="
margin: 3; 
 font-size: 17px;
 line-height: 26px;
 font-family: Arial, sans-serif;
 text-align: center;

">
  10
 </p>
</td>

Here is the documentation for BeautifulSoup:

https://beautiful-soup-4.readthedocs.io/en/latest/index.html

CodePudding user response:

A regex approach. Collect the new information about the style into a dictionary, (tag_name, attr_value)-pairs and pass it to update_style function for an in-place modification of the soup.

The substitution is invariant wrt white spaces and ;. A **subs could also be possible.

from bs4 import BeautifulSoup as bs
import re


def update_style(soup, subs):
    for tag_name, attr in subs.items():
        tag = soup.find(tag_name)
        k, v = attr.split(':')
        tag['style'] = re.sub(rf'{k}:(. );', f'{k}: {v.strip(" ;")};', tag['style'])


html = '''
<td style="background-color: #e5e5e5;">
  <p style="
     margin: 0;
     font-size: 12px;
     line-height: 16px;
     font-family: Arial, sans-serif;
     text-align: center;
     ">10</p>
</td>
'''

soup = bs(html, 'lxml')

subs = {'td': 'background-color: #762157;', 'p': 'line-height: 17px;'}
update_style(soup, subs)

print(soup.prettify())
  • Related