I want to do a search and replace on the textual part of the content of the HTML elements.
E.g., replacing foo
with <b>bar</b>
in
<div id="foo">foo <i>foo</i> hi foo hi</div>
should result in
<div id="foo"><b>bar</b> <i><b>bar</b></i> hi <b>bar</b> hi</div>
I already have a working version in Perl, but the HTML parser there is buggy:
#!/usr/bin/env perl
##
use strict;
use warnings;
use v5.34.0;
use Mojo::DOM;
##
my $input = do { local $/; <STDIN> };
my $dom = Mojo::DOM->new($input);
$dom->descendant_nodes->grep(sub { $_->type eq 'text' })
->each(sub{
$_->replace(s/(sth)/<span >$1<\/span>/gr)
});
say $dom;
CodePudding user response:
It's not recomended to use string manupulation functions such as .replace
& regex
on Html
strings...As you are looking solution in that area Just writing solution. Orginally we have to do with BeautifulSoup
html = """<div id="foo">foo <i>foo</i> hi foo hi</div>"""
res = html.replace("foo", "<b>bar</b>").replace("<b>bar</b>", "foo", 1)
print(res)
output#
<div id="foo"><b>bar</b> <i><b>bar</b></i> hi <b>bar</b> hi</div>
CodePudding user response:
- Search all text nodes containing
foo
- Create a
b
element - Replace the text with the new element
- Insert the desired text into the
b
from bs4 import BeautifulSoup, NavigableString, Tag
import re
import html
htmlString = '''
<div id="foo">foo <i>foo</i> hi foo hi</div>
'''
soup = BeautifulSoup(htmlString, "html.parser")
for n in soup.find_all(text=re.compile('foo')):
bold = soup.new_tag("b")
n.replaceWith(bold)
bold.insert(0, 'bar')
print(soup)
Output:
<div id="foo"><b>bar</b><i><b>bar</b></i><b>bar</b></div>