Home > other >  Use xmlstarlet in bash script to link xml nodes and extract elements
Use xmlstarlet in bash script to link xml nodes and extract elements

Time:02-11

I am fairly familiar with XML but I've never had to delve into the complexities, and I'm a novice with XSL, XPATH, and with XMLSTARLET. I need a shell script that will extract a simple list from XML files which follow this kind of format..

<tv>
  <channel id="ffd49fe9acd778774b4933e10b6afb75">
    <display-name>ITV</display-name>
  </channel>
  <channel id="398fe9bc5ee3b47556d06f9b9cb95562">
    <display-name>BBC One</display-name>
  </channel>
  <channel id="3932fa4d11310596e0c3eba7dc8caf4c">
    <display-name>Channel 4</display-name>
  </channel>
  <programme channel="398fe9bc5ee3b47556d06f9b9cb95562">
    <title>Pointless</title>
  </programme>
  <programme channel="ffd49fe9acd778774b4933e10b6afb75">
    <title>The Chase</title>
  </programme>
  <programme channel="398fe9bc5ee3b47556d06f9b9cb95562">
    <title>BBC News</title>
  </programme>
  <programme channel="3932fa4d11310596e0c3eba7dc8caf4c">
    <title>Naked Attraction</title>
  </programme>
</tv>

I picked up xmlstarlet (I'm using Cygwin) and I'm easily able to produce a list of channels or a list of programmes, but linking them together is beyond me although I've done a LOT of Googling.

#List of channels
xmlstarlet sel -t -m /tv/channel -v display-name -n

#List of Programmes
xmlstarlet sel -t -m /tv/programme -v title -n

#Filter a particular channel
sel -t -m "tv" -v 'channel[@id="398fe9bc5ee3b47556d06f9b9cb95562"]/display-name'  -n

It would seem to me a simple thing to replace the square brackets in the last statement with something like [@id=programme/@channel] but it seems xpath doesn't work like that. I'm looking for results like this..

Pointless, BBC One
The Chase, ITV
BBC News, BBC One
Naked Attraction, Channel 4

Can someone point me in the right direction please?

CodePudding user response:

One way to circumvent the context would be to store the matching channel in a variable:

xmlstarlet sel -t -m '/tv/programme' --var ch='@channel' -v 'title' \
  -v '", "' -v '/tv/channel[@id=$ch]/display-name' -n
Pointless, BBC One
The Chase, ITV
BBC News, BBC One
Naked Attraction, Channel 4
  • Related