Home > Software engineering >  retreive sample or a given amount of nodes with xmlstarlet
retreive sample or a given amount of nodes with xmlstarlet

Time:11-06

I'm working with an huge xml file and I need to get a sample of 500 nodes that are direct children of the root node. I know they are of the same type. I need to get all the children of those 500 nodes.

There is a way to do so in xmlstarlet?

I'd prefer using this specific package because I'm already using it to do other manipulations of the same file.

I tried looking in the help page of the package but couldn't find a way

CodePudding user response:

You could try:

xmlstarlet sel -t -c "/root/child[position() <= 500]" file.xml
  • sel is the standard method for querying XML
  • -t is always needed when using sel
  • -c is for copying
    (whatever you select next in your xpath)
  • /root/child is the xpath
    (replace with actual element names of obviously)
  • [position() <= 500] selects all nodes whose position (within the root element) is 500 or smaller.

Sometimes, I find that enclosing the path in brackets makes the selection work:

xmlstarlet sel -t -c "(/root/child)[position() <= 500]" file.xml

but generally, the first method should be enough.


So, given an input of:

<root>
    <child>...</child>
    <child>...</child>
    ...
</root>

you would get:

<child>...</child><child>...</child>...

Mind you, no syntactically valid XML.

To separate with newlines, try a variation like:

xmlstarlet sel -t -m "/root/child[position() <= 500]" -c "." -n file.xml
  • -m just matches the xpath
    (doesn't produce output)
  • -c "." copies the matched node
  • -n appends a newline after each matched/copied node
  • Related