Home > Net >  How do I edit attritubes of HTML tags using Beautiful Soup
How do I edit attritubes of HTML tags using Beautiful Soup


I have an HTML tag like the following:


<td style="background-color: #e5e5e5;">
  <p style="
     margin: 0; 
     font-size: 12px;
     line-height: 16px;
     font-family: Arial, sans-serif;
     text-align: center;

I can update the value from 10 to 15 in the tag with BeautifulSoup:


However, I haven't figured out a way to update the values in the style tags; because they seem to be a part of their parent, or 'base' tags.

For example, how would I update the tag to the following?

print(tag) --> 
<td style="background-color: #762157;">
   <p style="
      margin: 0;
      font-size: 12px;
      line-height: 17px;
      font-family: Arial, sans-serif;
      text-align: center;

I change the background-color to #762157 and line-height to 17px;

CodePudding user response:

In the context of the html you posted, style is not a tag, but an attribute of the p tag. Here is one way to modify that p attribute (and you can apply the same for td style attribute):

from bs4 import BeautifulSoup as bs

html = '''
<td style="background-color: #e5e5e5;">
  <p style="
     margin: 0; 
     font-size: 12px;
     line-height: 16px;
     font-family: Arial, sans-serif;
     text-align: center;

soup = bs(html, 'html.parser')
print('OLD SOUP')
p_style_attribute = soup.select_one('p').get('style')
new_p_style_attr = '''
margin: 3; 
 font-size: 17px;
 line-height: 26px;
 font-family: Arial, sans-serif;
 text-align: center;

soup.select_one('p')['style'] = new_p_style_attr
print('NEW SOUP')

This will print out in terminal:

<td style="background-color: #e5e5e5;">
 <p style="
     margin: 0; 
     font-size: 12px;
     line-height: 16px;
     font-family: Arial, sans-serif;
     text-align: center;

<td style="background-color: #e5e5e5;">
 <p style="
margin: 3; 
 font-size: 17px;
 line-height: 26px;
 font-family: Arial, sans-serif;
 text-align: center;


Here is the documentation for BeautifulSoup:


CodePudding user response:

A regex approach. Collect the new information about the style into a dictionary, (tag_name, attr_value)-pairs and pass it to update_style function for an in-place modification of the soup.

The substitution is invariant wrt white spaces and ;. A **subs could also be possible.

from bs4 import BeautifulSoup as bs
import re

def update_style(soup, subs):
    for tag_name, attr in subs.items():
        tag = soup.find(tag_name)
        k, v = attr.split(':')
        tag['style'] = re.sub(rf'{k}:(. );', f'{k}: {v.strip(" ;")};', tag['style'])

html = '''
<td style="background-color: #e5e5e5;">
  <p style="
     margin: 0;
     font-size: 12px;
     line-height: 16px;
     font-family: Arial, sans-serif;
     text-align: center;

soup = bs(html, 'lxml')

subs = {'td': 'background-color: #762157;', 'p': 'line-height: 17px;'}
update_style(soup, subs)

  • Related