Home > OS >  Remove specific tag with its contents using sed
Remove specific tag with its contents using sed

Time:07-06

I would like to remove following tag from HTML including its constantly varying contents:

<span >li4tuq734g23r74r7Whatever</span>

A following BASH script

.... | sed -e :a -re 's/<span />.*</span>//g' > "$NewFile"

ends with error

sed: -e expression #2, char XX: unknown option to `s'

I tried to escape quotes, slashes and "less than" symbols in various combinations and still get this error.

CodePudding user response:

I suggest using a different separator than / when / is contained within the thing you want to match on. Also, prefer -E instead of -r for extended regex to be Posix compatible. Also note that you have a / in your first span in your regex that doesn't belong there. Also, .* will make it overly greedy and eat up any </span> that follows the first </span> on the line. It's better to match on [^<]*. That is, any character that is not <.

sed -E 's,<span >[^<]*</span>,,g'

A better option is of course to use a HTML parser for this.

  • Related