Home > other >  Retrieving values and attributes from XML using XPath
Retrieving values and attributes from XML using XPath

Time:05-30

Junior .NET C# self-learner here. I've 'lost' a day today trying to wrap my head around XPath, XPath expressions, XPathDocument, XPathNodeIterator, and XPathNavigator. I think I must be doing something wrong because I can't believe it's this difficult to get names, values, and attributes from an XML document.

I was able to achieve my intended goal in PowerShell in just a few lines, in less than 10 minutes. Yet this has taken me an entire day and I still feel totally lost.

Can someone please help by:

  1. Confirming I'm on the right lines and not doing something completely crazy.
  2. Telling me if there's a simpler way to do this.
  3. Assisting me with getting the missing value/attributes.

Here's my XML document: https://gist.github.com/arbitmcdonald/3c5381e920fac7b880df68912bfddbd9

The data we're interested in (from the linked file above) are these bad boys:

<cdf:rule-result severity="medium" weight="10.0" time="2022-05-28T23:12:16" version="DTBI014-IE11" idref="xccdf_mil.disa.stig_rule_SV-59337r8_rule">
      <cdf:result>fail</cdf:result>
      <cdf:ident system="http://iase.disa.mil/cci">CCI-002450</cdf:ident>
      <cdf:fix id="F-50263r18_fix"></cdf:fix>
      <cdf:check system="http://oval.mitre.org/XMLSchema/oval-definitions-5">
            <cdf:check-content-ref href="#scap_mil.disa.stig_comp_U_MS_IE11_V1R16_STIG_SCAP_1-2_Benchmark-oval.xml" name="oval:mil.disa.fso.ie:def:580"></cdf:check-content-ref>
      </cdf:check>
</cdf:rule-result>

At the moment, using the XML snippet above, I'm getting the following output (per iteration):

severity = medium
weight = 10.0
time = 2022-05-28T23:12:16
version = DTBI014-IE11
idref = xccdf_mil.disa.stig_rule_SV-59337r8_rule
cdf:rule-result = failCCI-002450

What I'm trying to get is:

severity = medium
weight = 10.0
time = 2022-05-28T23:12:16
version = DTBI014-IE11
idref = xccdf_mil.disa.stig_rule_SV-59337r8_rule
result = fail
checkIdentifier = CCI-002450
fixId = F-50263r18_fix
checkSchema = http://oval.mitre.org/XMLSchema/oval-definitions-5
checkName = oval:mil.disa.fso.ie:def:580

Here's my playground/study code so far:

void Main()
{
    // This is shared above
    string path = @"C:\tmp\Example.xml"; 
    
    XPathNavigator nav;
    XPathDocument docNav;
    XPathNodeIterator NodeIter;
    String strExpression;
    
    // Open the XML.
    docNav = new XPathDocument(path);
    
    // Create a navigator to query with XPath.
    nav = docNav.CreateNavigator();
    strExpression = "/cdf:Benchmark/cdf:TestResult/cdf:rule-result";

    var nsmgr = new XmlNamespaceManager(nav.NameTable);
    nsmgr.AddNamespace("cdf", "http://checklists.nist.gov/xccdf/1.2");

    // Select the node and place the results in an iterator.
    NodeIter = nav.Select(strExpression, nsmgr);

    while (NodeIter.MoveNext())
    {
        XPathNavigator navigator2 = NodeIter.Current.Clone();
        navigator2.MoveToFirstAttribute();
        Console.WriteLine("{0} = {1}", navigator2.Name, navigator2.Value);

        while (navigator2.MoveToNextAttribute())
        {
            Console.WriteLine("{0} = {1}", navigator2.Name, navigator2.Value);
        }
        
        Console.WriteLine("{0} = {1}", NodeIter.Current.Name, NodeIter.Current.Value);

        Console.WriteLine();
    }
}

Thanks in advance for any pointers, either I'm doing this all wrong, or this is bizarrely one of the toughest things I've ever had to do in .NET...

CodePudding user response:

It looks like you're iterating through the attributes of your <cdf:rule-result> nodes, but not the child nodes. I haven't used XPathNavigator for a long time, but I'd suggest looking into the XPathNavigator.SelectChildren andXPathNavigator.SelectDescendants methods.

CodePudding user response:

I wouldn't mess around with C# code to do any of this, I would do it all in a single XSLT 3.0 stylesheet:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0" expand-text="yes" xmlns:cdf=""http://checklists.nist.gov/xccdf/1.2">

<xsl:output method="text"/>

<xsl:template match="/">
severity = {cdf:rule-result/@severity}
weight = {cdf:rule-result/@weight}
time = {cdf:rule-result/@time}
version = {cdf:rule-result/@version}
idref = {cdf:rule-result/@idref}
result = {cdf:rule-result/@severity}
checkIdentifier = {cdf:rule-result/@cdf:ident}
fixId = {cdf:rule-result/cdf:fix/@id}
checkSchema = {cdf:rule-result/cdf:check/@system}  
checkName = {cdf:rule-result/cdf:check-content-ref/@name} 
</xsl:template>
</xsl:transform>

It's not much more difficult in XSLT 1.0 if you prefer to use the Microsoft processor - just replace the stuff in curly braces with <xsl:value-of select="..."/>

CodePudding user response:

If you want to use XPath 1.0 you can at least do e.g.

        foreach (XPathNavigator result in nav.Select(strExpression, nsmgr))
        {
            foreach (XPathNavigator value in result.Select(".//@* | .//*[normalize-space()]", nsmgr))
            {
                Console.WriteLine("{0} = {1}", value.LocalName, value.Value);
            }
            Console.WriteLine();
        }

This will collect all data you have shown and two or three that are in the sample but your wanted output doesn't show; it won't rename things, obviously. It is not clear which values are wanted and which not so I have not attempted to exclude values so far in XPath.

  • Related