Home > Software engineering >  Remove Block of HTML Table Data Linux Bash
Remove Block of HTML Table Data Linux Bash

Time:02-03

I have a html file that I process using a bash script and want to remove empty tables. The file is generated from a sql statement, but contains the table header when no records are found. I want to remove the header where no records are found.

<table border="1">
  <caption>Table with data</caption>
  <tr>
    <th align="center">type</th>
    <th align="center">column1</th>
    <th align="center">column2</th>
    <th align="center">column3</th>
    <th align="center">column4</th>
   </tr>
   
   Data rows exists here
   
  </table>

<table border="1">
  <caption>Empty Table To Remove</caption>
  <tr>
    <th align="center">type</th>
    <th align="center">column1</th>
    <th align="center">column2</th>
    <th align="center">column3</th>
    <th align="center">column4</th>
    <th align="center">column5</th>
    <th align="center">column6</th>
    <th align="center">column7</th>
  </tr>
</table>

<table border="1">
  <caption>Table with data</caption>
  <tr>
   <th align="center">type</th>
    <th align="center">column1</th>
    <th align="center">column2</th>
    <th align="center">column3</th>
    <th align="center">column4</th>
   </tr>
     Data rows exists here
  </table>

I tried to use a combination of grep and sed to remove the empty table. I was able to accomplish this when the tables contained an equal number of columns. I am having issues now that I have tables with a different number of columns.

When the table had an equal number of columns, I was able to loop through based on the caption, do a count and then remove. This is not working since the number of columns vary.

CodePudding user response:

Like this, using and :

$ xmlstarlet format -H file.html | sponge file.html
$ xmlstarlet ed -d '//table[./caption/text()="Empty Table To Remove"]' file.html 
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <body>
    <table border="1"><caption>Table with data</caption><tr><th align="center">type</th><th align="center">column1</th><th align="center">column2</th><th align="center">column3</th><th align="center">column4</th></tr>
   
   Data rows exists here
   
  </table>
    <table border="1"><caption>Table with data</caption><tr><th align="center">type</th><th align="center">column1</th><th align="center">column2</th><th align="center">column3</th><th align="center">column4</th></tr>
     Data rows exists here
  </table>
  </body>
</html>

To edit in place like sed -i, use

xmlstarlet edit -L ...

Not explained, but don't use sed nor regex to parse HTML/XML

  • Related