I would like to remove following tag from HTML including its constantly varying contents:
<span >li4tuq734g23r74r7Whatever</span>
A following BASH script
.... | sed -e :a -re 's/<span />.*</span>//g' > "$NewFile"
ends with error
sed: -e expression #2, char XX: unknown option to `s'
I tried to escape quotes, slashes and "less than" symbols in various combinations and still get this error.
CodePudding user response:
I suggest using a different sed separator than /
when /
is contained within the thing you want to match on. Also, prefer -E
instead of -r
for extended regex to be Posix compatible. Also note that you have a /
in your first span
in your regex that doesn't belong there.
Also, .*
will make it overly greedy and eat up any </span>
that follows the first </span>
on the line. It's better to match on [^<]*
. That is, any character that is not <
.
sed -E 's,<span >[^<]*</span>,,g'
A better option is of course to use a HTML parser for this.