only select n number of matched lines using bash-CodePudding

Using this command:

sed -n '/<article class.*article--nyheter/,/<\/article>/p' news2.html > onlyArticles.html

I get all these articles tags in my html document. They are about 50 articles.

Sample input:

<article class="article column large-12 small-12 article--nyheter">
    ... variable number of lines of dat
</article>

<article class="article column large-12 small-12 article--nyheter">
    ... variable number of lines of dat
</article>

<article class="article column large-12 small-12 article--nyheter">
    ... variable number of lines of dat
</article>

<article class="article column large-12 small-12 article--nyheter">
    ... variable number of lines of dat
</article>

I just want x number of articles. Like just top 2 articles.

Output:

<article class="article column large-12 small-12 article--nyheter">
    ... variable number of lines of dat
</article>

<article class="article column large-12 small-12 article--nyheter">
    ... variable number of lines of dat
</article>

This is just an example. What I am trying to achieve is to select only (x) number of matching nodes.

Is there any way to do it? Cannot just use simple head or tail as I need to extract the matching elements not just some x amount of lines.

CodePudding user response：

xmllint xpath can be used requesting tags by position

xmllint --html --recover --xpath '//article[position()<=2]' tmp.html 2>/dev/null