Home > Software design >  PowerShell XML find element that is an exact match
PowerShell XML find element that is an exact match

Time:09-28

I am doing some export to XML of uninstall data, mostly for Autodesk products. And Autodesk has a habit of duplicating things, like multiple versions of the same software, with different GUIDs, but that can't be installed side by side, and the old GUID isn't deleted by installing the update. And the way my XML works, I abstract the GUID to a variable in the uninstall string, so I can have one <UninstallProgram> element with the data needed to find and delete all instances. But of course I FIND that data in the registry twice, so my current code creates two elements. Net result is I can have this element twice.

<UninstallProgram id="Lighting Analysis for Revit 2023">
    <Search>Lighting Analysis for Revit 2023</Search>
    <Filter>UninstallString -like *AdODIS*</Filter>
    <Resource>C:\ProgramData\Autodesk\ODIS\metadata</Resource>
    <Executable>C:\Program Files\Autodesk\AdODIS\V1\Installer.exe</Executable>
    <Arguments>-i uninstall --trigger_point system -m [Task~Resource]\[Task~GUID]\bundleManifest.xml -x [Task~Resource]\[Task~GUID]\SetupRes\manifest.xsd -q</Arguments>
</UninstallProgram>

What I am wondering is, is there an easy way to take an element variable, that has been created but not yet appended, and search for any other element that is exactly the same, including all attributes, child elements and element text? I know I can search for an element with the same ID, but if Autodesk does something weird and I somehow build a second element with the same ID but different contents, I want to append that so I can find it and start addressing my code to find the new condition Autodesk has so kindly provided. I don't want to spend too much time or code. I have already looked at just iterating through the current XML, converting every single element to a string representation and comparing that to the string representation of the element being evaluated, but that gets ugly performance wise, since the current XML gets larger and larger, and I would be doing this comparison hundreds of times. And the issue arrises rarely enough that just manually editing the XML isn't THAT big of a deal. Ideally what I want is something that is part of XPath, so highly optimized, that allows for a conditional like

if ($xmlSelectSingleNode("NotMatch $newUninstallElement")){
    [Void]$rootElement.AppendChild($newUninstallElement)
}

CodePudding user response:

The following is by no means a robust solution, but it may be good enough for your use case:

  • Use .SelectNodes() with an XPath query based on the id attribute with to find all candidate elements matching the lookup element.

  • Among the candidate elements, find the one(s) that matches the full content of the lookup element, via the .OuterXml property; see below for assumptions and limitations.

# Sample document.
# Note that the two UninstallProgram elements differ by the <Search> element value only.
[xml] $xmlDoc = @'
<xml>
<UninstallProgram id="Lighting Analysis for Revit 2023">
    <Search>Lighting Analysis for Revit 2023</Search>
    <Filter>UninstallString -like *AdODIS*</Filter>
    <Resource>C:\ProgramData\Autodesk\ODIS\metadata</Resource>
    <Executable>C:\Program Files\Autodesk\AdODIS\V1\Installer.exe</Executable>
    <Arguments>-i uninstall --trigger_point system -m [Task~Resource]\[Task~GUID]\bundleManifest.xml -x [Task~Resource]\[Task~GUID]\SetupRes\manifest.xsd -q</Arguments>
</UninstallProgram>
<UninstallProgram id="Lighting Analysis for Revit 2023">
    <Search>DIFFERS</Search>
    <Filter>UninstallString -like *AdODIS*</Filter>
    <Resource>C:\ProgramData\Autodesk\ODIS\metadata</Resource>
    <Executable>C:\Program Files\Autodesk\AdODIS\V1\Installer.exe</Executable>
    <Arguments>-i uninstall --trigger_point system -m [Task~Resource]\[Task~GUID]\bundleManifest.xml -x [Task~Resource]\[Task~GUID]\SetupRes\manifest.xsd -q</Arguments>
</UninstallProgram>
</xml>
'@

# A sample element to look for in the document.
# Note: The assumption is that its .OuterXml property has no incidental whitespace,
#       which is what using an [xml] cast does.
$elem = ([xml] '<UninstallProgram id="Lighting Analysis for Revit 2023">
<Search>Lighting Analysis for Revit 2023</Search><Filter>UninstallString -like *AdODIS*</Filter><Resource>C:\ProgramData\Autodesk\ODIS\metadata</Resource><Executable>C:\Program Files\Autodesk\AdODIS\V1\Installer.exe</Executable><Arguments>-i uninstall --trigger_point system -m [Task~Resource]\[Task~GUID]\bundleManifest.xml -x [Task~Resource]\[Task~GUID]\SetupRes\manifest.xsd -q</Arguments>
</UninstallProgram>').DocumentElement

# Find all elements with the same ID using an XPath query, then
# compare each matching element's .OuterXml values to that of the lookup element.
$xmlDoc.
  SelectNodes(('//UninstallProgram[@id="{0}"]' -f $elem.id)).
  Where({ $_.OuterXml -ceq $elem.OuterXml })

The above finds only the first <UninstallProgram> element, because - while both have the same id attribute and are therefore matched by the XPath query passed to .SelectNodes() - only the first's content, as reflected in the .OuterXml property value, matches that of the lookup element.

Assumptions:

  • Both the input document and the element to look up must be parsed with incidental whitespace removed; using an [xml] cast in PowerShell (to parse XML text into a System.Xml.XmlDocument instance) does that by default.

  • The targets element's attributes, child elements, and their attributes must be in the same order in the input document and the lookup element.

  • If XML namespaces are involved, more work is needed.

CodePudding user response:

Consider XPath's sibling, XSLT, to de-duplicate nodes using the Muenchian Method where the use of <xsl:key> implements hash tables on the document for efficient processing. PowerShell can run XSLT 1.0 with .NET's XslCompiledTransform Class. Specifically, below stylesheet runs the Identity Transform to copy document as is, indexes all the underlying content of <UninstallProgram> with dot notation, and keeps only the first unique instance <UninstallProgram> and its content.

XSLT (save as .xsl, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" method="xml"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="txt" match="UninstallProgram" use="." />

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="UninstallProgram[generate-id() != generate-id(key('txt', .))]"/>

</xsl:stylesheet>

PowerShell

# Load the style sheet.
$xslt = New-Object System.Xml.Xsl.XslCompiledTransform;
$xslt.Load("C:\Path\To\style.xsl");

# Execute the transform and output the results to a file.
$xslt.Transform("C:\Path\To\input.xml", "C:\Path\To\output.xml");
  • Related