I have this exercise, which I cannot solve.
Replace all para> and </para tags with the null string. If the resulting line is empty, delete the line. (You may need to use curly braces to make this happen.)
This is a part of the document I've got (web.docbook):
<para>
This is an article about the World Wide Web.
The World Wide Web is a collection of documents that are linked to
one another. The Web is <emphasis>not</emphasis> the same as the
Internet. The Internet is a world-wide network of networks, and it
does far more than simply serve up Web pages.
</para>
<para>Tim Berners-Lee, the inventor of the World Wide Web, put special
emphasis on the portability of web pages. Rather than create a
proprietary format, he made Web pages dependent only upon plain ASCII
text.</para>
<para>
Web pages are written in a markup language called HTML. Here is what it
looks like. The < and > mark off elements.
</para>
<listing>
<body>
<div id="top-navig">
<a id="top"></a>
<a href="index.html">CIT 040 Index</a>
&gt;
Assignment 1
</div>
I was able to solve the first part of the exercise, which is working perfectly, however I can't figure out how to delete only those empty lines that are the results of my replacement.
I would really appreciate if you could help me!
CodePudding user response:
Not sure whether you're actually supposed to use some conditionals and curly braces, but this works just fine ;) and does to the text what you're asking...
$ sed -E '/^<\/?para>$/d;s/<\/?para>//g' web.docbook
This is an article about the World Wide Web.
The World Wide Web is a collection of documents that are linked to
one another. The Web is <emphasis>not</emphasis> the same as the
Internet. The Internet is a world-wide network of networks, and it
does far more than simply serve up Web pages.
Tim Berners-Lee, the inventor of the World Wide Web, put special
emphasis on the portability of web pages. Rather than create a
proprietary format, he made Web pages dependent only upon plain ASCII
text.
Web pages are written in a markup language called HTML. Here is what it
looks like. The < and > mark off elements.
<listing>
<body>
<div id="top-navig">
<a id="top"></a>
<a href="index.html">CIT 040 Index</a>
&gt;
Assignment 1
</div>
$