How to replace first occurrence string in html file with shell script-CodePudding

In html file, I need to replace only the first occurrence of:

<table id="any string" >

"any string" is whatever inside the " ". There is an space before the last > character.

expected output:

<table id="new string">

I know that maybe a sed -i can make it but i don't know how to match the "any string" part and only the first occurrence.

CodePudding user response：

Using sed grouping and back referencing, you can exclude the text within the quotes as well as the space at the end.

$ sed -i.backup '0,/table id/s/\(table id="\).*\("\) /\1new string\2/' input_file
<table id="new string">

This will create a backup of the original file.

CodePudding user response：

Assuming you any string actually means any string, as in you don't know what it is and it could be anything, you have to use the quotes as delimiters. You mentioned sed so here's a simple sed solution:

# GNU sed needs -r for extended regexp, macOS sed needs -E for this
# s means for substitute
# / slashes are delimiters surrounding the paaterns, /before/after/
# [^ ] means any character that is *not* a space
#   means one or more of those characters
# followed by a space
# (. ) means one or more of any character, and remember what it is
# \1 use that first remembered pattern

sed -r 's/table id="[^ ]  (. )"/table id="new \1"/' file.html

So it will match a table with an ID in double quotes, which includes a space, and replace everything in the ID up to that space with "new".

Examples:

<table id="any string" > -> <table id="new string" >
<table id="compact striped" > -> <table id="new striped" >
<table id="data compact striped" > -> <table id="new compact striped" >

If any string actually means any string at all, not necessarily with a space (eg "foo"), and new string means any new string (eg bar), is the problem is a whole lot simpler:

sed -r 's/table id=". "/table id="new"/' file.html