Home > Back-end >  Linux replace the nth instance of a line matching a pattern in a file
Linux replace the nth instance of a line matching a pattern in a file

Time:04-10

I have a file like this:

        <div class='items'>
          <div class='item'>
            <div class='itemDescription'>random string 1</div>
            <div class='itemDate'>random date 1</div>
          </div>
          <div class='item'>
            <div class='itemDescription'>random string 2</div>
            <div class='itemDate'>random date 2</div>
          </div>
          <div class='item'>
            <div class='itemDescription'>random string 3</div>
            <div class='itemDate'>random date 3</div>
          </div>
          <div class='item'>
            <div class='itemDescription'>random string 4</div>
            <div class='itemDate'>random date 4</div>
          </div>
        </div>

I need to be able to replace the nth instance/occurrence of item where item is a collection of lines in the file. For example when n=3.

        <div class='items'>
          <div class='item'>
            <div class='itemDescription'>random string 1</div>
            <div class='itemDate'>random date 1</div>
          </div>
          <div class='item'>
            <div class='itemDescription'>random string 2</div>
            <div class='itemDate'>random date 2</div>
          </div>
          <div class='item'>
            <div class='itemDescription'>random string 4</div>
            <div class='itemDate'>random date 4</div>
          </div>
        </div>

For example when n=2.

        <div class='items'>
          <div class='item'>
            <div class='itemDescription'>random string 1</div>
            <div class='itemDate'>random date 1</div>
          </div>
          <div class='item'>
            <div class='itemDescription'>random string 3</div>
            <div class='itemDate'>random date 3</div>
          </div>
          <div class='item'>
            <div class='itemDescription'>random string 4</div>
            <div class='itemDate'>random date 4</div>
          </div>
        </div>

How would I be able to accomplish this with sed?

I was hoping for something like:

sed -i "/\s*<div class='item'>/3d";
sed -i "/\s*<div class='itemDescription'>random string 4</div>/3d";
sed -i "/\s*<div class='itemDate'>random date 4</div>/3d";
sed -i "/\s*</div>/3d";

Above 3d would mean delete the 3rd instance of a match ignoring the other instances.

Using a range where n=3:

sed -i "/\s*<div class='item'>/3,/\s*</div>/{///!d}";

Above /\s*<div class='item'>/3 would mean start from the third match of the pattern instead of the first.

None of the above are valid sed statements but they would give an idea what i'm looking for.

I'm also open to the idea of using awk or another tool. awk -i inplace "..." file

Also I don't think deleting a number of lines from the match is a good idea in case the random string becomes multi line.

I hope this is clear. Thanks for any help in advance.

Search terms...

"linux replace the nth instance of a line in a file"

"linux replace the nth occurrence of a line in a file"

"bash replace the nth occurrence of a line in a file"

CodePudding user response:

Regular expressions and tools like sed are the wrong thing entirely for trying to work with structured data like xml. Instead you want something that can understand XML and manipulate documents based on XPath expressions. xmlstarlet is one popular such tool. For example, to delete the third item div:

$ xmlstarlet ed -d '//div[@][3]' example.xml
<?xml version="1.0"?>
<div >
  <div >
    <div >random string 1</div>
    <div >random date 1</div>
  </div>
  <div >
    <div >random string 2</div>
    <div >random date 2</div>
  </div>
  <div >
    <div >random string 4</div>
    <div >random date 4</div>
  </div>
</div>

Or using hxremove from w3's HTML-XML Utils package, which uses CSS selectors instead of XPath:

$ hxremove '.item:nth-child(3)' < example.xml 
<div >
  <div >
    <div >random string 1</div>
    <div >random date 1</div>
  </div>
  <div >
    <div >random string 2</div>
    <div >random date 2</div>
  </div>
  
  <div >
    <div >random string 4</div>
    <div >random date 4</div>
  </div>
</div>

CodePudding user response:

Based on the answer from @Shawn

xmlstarlet ed --pf --omit-decl --inplace -d '///div[@][3]' file.html

explanation:

# ed - edit
# --pf - preserve formatting
# --omit-decl - omit xml deceleration <?xml version="1.0" ?>
# --inplace - save the changes in the file don't only print the results

sed -i '/^[[:space:]]*$/d' file.html # delete empty lines

input file.html:

<div>
  <style>
    /*...*/
  </style>
  <div >
    <div >
      <div >Latest News</div>
      <div >
        <div class='newsItem'>
          <div class='newsItemDate'>random date 1</div>
          <div class='newsItemHeading'>random heading</div>
        </div>
        <div >
          <div >random date 2</div>
          <div >random heading 2</div>
        </div>
        <div >
          <div >random date 3</div>
          <div >random heading 3</div>
        </div>
      </div>
    </div>
  </div>
</div>

output file.html before sed:

<div>
  <style>
    /*...*/
  </style>
  <div >
    <div >
      <div >Latest News</div>
      <div >
        <div class='newsItem'>
          <div class='newsItemDate'>random date 1</div>
          <div class='newsItemHeading'>random heading</div>
        </div>
        <div >
          <div >random date 2</div>
          <div >random heading 2</div>
        </div>

      </div>
    </div>
  </div>
</div>

output file.html after after sed:

<div>
  <style>
    /*...*/
  </style>
  <div >
    <div >
      <div >Latest News</div>
      <div >
        <div class='newsItem'>
          <div class='newsItemDate'>random date 1</div>
          <div class='newsItemHeading'>random heading</div>
        </div>
        <div >
          <div >random date 2</div>
          <div >random heading 2</div>
        </div>
      </div>
    </div>
  </div>
</div>
  • Related