Home > Software design >  Select all items after the first element of a type
Select all items after the first element of a type

Time:12-12

I'm trying to select all the text within a span after the first <hr> tag.

I'm using this as my testing link. https://13ulbasaur.github.io/RandomTesting/

<span id="selectorID">
  <b>Header Text</b>
  Some more header text.
  <hr>
  Body text that I want starts here, it may also include <a href="www.google.com">links</a>, <b>bolded text</b>, and even...
  <ul>
    <li>Lists!</li>
    <li>With a bunch of items.</li>
    <li>I want these too.</li>
  </ul>
  Then after all of that, it may also include
  <hr>
  Another HR, <b>but I want this text too that comes after this.</b> As long as it's after the first hr.
</span>

I want all of the text, so this includes stuff in the lists, in links, etc, and stuff after the second or more <hr>, as long as its within the span with the selectorID ID and after the first <hr> tag.

The closest I got to was with the code below, but it refuses to give me back any of the text that is within additional tags, which makes sense since the items within the tag won't have hr as a sibling anymore.

//span[contains(@id,'selectorID')]/descendant-or-self::*/text()[count(preceding-sibling::hr)>0]

What would be the right way to do this? Ideally I don't want to actually have to use text() because it'd be nice to see when there's line breaks and stuff.

CodePudding user response:

I'm not sure I understand what exactly you're after but try changing your xpath expression to

//span[@id='selectorID']//self::node()[preceding::hr[preceding::span]]//.

and see if it works.

If you want to get rid of the duplicates, you can wrap the whole function with unique().

CodePudding user response:

Use importdata() and regexextract(), like this:

=regexextract( 
  join(char(10), importdata("https://13ulbasaur.github.io/RandomTesting/", "µ")), 
  "selectorID\b[\s\S] ?<hr>\s*([\s\S] ?)\s*</span>" 
)
  • Related