Assume I have an HTML file like this:
<body>
<div id="a">
content of div a
<div id="b"> content of div b </div>
<div id="c"> content of div c </div>
</div>
<style>
#a {color: red; }
#b {color: green; }
#c {color: blue; }
</style>
</body>
I want to append a unique suffix (say, -suffix
) to all ids, which would include attributes id="..."
and selectors #...
, and result in a file like this:
<body>
<div id="a-suffix">
content of div a
<div id="b-suffix"> content of div b </div>
<div id="c-suffix"> content of div c </div>
</div>
<style>
#a-suffix {color: red; }
#b-suffix {color: green; }
#c-suffix {color: blue; }
</style>
</body>
How do I accomplish this with standard unix shell tools like sed
, grep
, awk
in a way that would cover as many situations as possible?
My attempt:
I came up with the following sed
command:
sed -e 's/id="\([-_a-zA-Z0-9]*\)"/id="\1-suffix"/g;s/#\([-_a-zA-Z0-9]*\)/#\1-suffix/g' index.html
Which is actually two commands in one:
s/id="\([-_a-zA-Z0-9]*\)"/id="\1-suffix"/g
- substitutes id attributesid="..."
s/#\(\[-_a-zA-Z0-9]*\)/#\1-suffix/g
- substitutes id selectors#...
However it's far from perfect. First, it only supports double attribute values in double quotes id="..."
and id values are limited in that they have to match [-_a-zA-Z0-9]*
. Second, this clashes with hex colors, so a white color like #ffffff
would get a suffix #ffffff-suffix
; An id selector like #...
should only get a suffix if an appropriate attribute id="..."
exists.
What is the best way to accomplish this?
CodePudding user response:
There are a lot of cases in your file, as you mentionned with the colour problem My approach would be to treat the file line by line using
cat inputfile.html | while read a; do
some code
echo "$a" >> outputfile.html
done
This being said, you may use
b=$(expr "$a" : "regex")
To precisely filter what you want to modify and only then, use some
sed
on $b to get what you want and push $b into $a