Home > Back-end >  Linux sed exercise
Linux sed exercise

Time:11-21

I have this exercise, which I cannot solve.

Replace all para> and </para tags with the null string. If the resulting line is empty, delete the line. (You may need to use curly braces to make this happen.)

This is a part of the document I've got (web.docbook):

<para>
This is an article about the World Wide Web.
The World Wide Web is a collection of documents that are linked to
one another. The Web is <emphasis>not</emphasis> the same as the
Internet. The Internet is a world-wide network of networks, and it
does far more than simply serve up Web pages.
</para>

<para>Tim Berners-Lee, the inventor of the World Wide Web, put special
emphasis on the portability of web pages. Rather than create a
proprietary format, he made Web pages dependent only upon plain ASCII
text.</para>

<para>
Web pages are written in a markup language called HTML. Here is what it
looks like. The &lt; and &gt; mark off elements.
</para>

<listing>
&lt;body&gt;
&lt;div id="top-navig"&gt;
&lt;a id="top"&gt;&lt;/a&gt;
&lt;a href="index.html"&gt;CIT 040 Index&lt;/a&gt;
&amp;gt;
Assignment 1
&lt;/div&gt;

I was able to solve the first part of the exercise, which is working perfectly, however I can't figure out how to delete only those empty lines that are the results of my replacement.

I would really appreciate if you could help me!

CodePudding user response:

Not sure whether you're actually supposed to use some conditionals and curly braces, but this works just fine ;) and does to the text what you're asking...

$ sed -E '/^<\/?para>$/d;s/<\/?para>//g' web.docbook
This is an article about the World Wide Web.
The World Wide Web is a collection of documents that are linked to
one another. The Web is <emphasis>not</emphasis> the same as the
Internet. The Internet is a world-wide network of networks, and it
does far more than simply serve up Web pages.

Tim Berners-Lee, the inventor of the World Wide Web, put special
emphasis on the portability of web pages. Rather than create a
proprietary format, he made Web pages dependent only upon plain ASCII
text.

Web pages are written in a markup language called HTML. Here is what it
looks like. The &lt; and &gt; mark off elements.

<listing>
&lt;body&gt;
&lt;div id="top-navig"&gt;
&lt;a id="top"&gt;&lt;/a&gt;
&lt;a href="index.html"&gt;CIT 040 Index&lt;/a&gt;
&amp;gt;
Assignment 1
&lt;/div&gt;
$
  • Related