Home > OS >  xmlstarlet Remove nodes with duplicate attributes
xmlstarlet Remove nodes with duplicate attributes

Time:11-11

I am trying to delete duplicate entries in an xml file based on the value of an attribute.

<?xml version='1.0' encoding='UTF-8'?>
<root>
  <entries>
    <entry name="entry1">
      <value>1</value>
    </entry>
    <entry name="entry1">     <-- Duplicate name here
      <value>2</value>
    </entry>
    <entry name="entry2">
      <value>3</value>
    </entry>
  </entries>
</root>

And I want the following

<?xml version='1.0' encoding='UTF-8'?>
<root>
  <entries>
    <entry name="entry1">
      <value>1</value>
    </entry>
    <entry name="entry2">
      <value>3</value>
    </entry>
  </entries>
</root>

I have tried

xmlstarlet edit --delete '/_:root/_:entries/*[@name = .//preceding-sibling::*/@name]'

But the xpath does not match the previous entry with the attribute name="entry1"

CodePudding user response:

If you can use the YAML processor mikefarah/yq, your task can be solved in this way:

Version < 4.30

yq --input-format xml --output-format xml e '.root.entries.entry |= unique_by(. name)'

Version >= 4.30

yq --input-format xml --output-format xml e '.root.entries.entry |= unique_by(. @name)'

Output

<?xml version='1.0' encoding='UTF-8'?>
<root>
  <entries>
    <entry name="entry1">
      <value>1</value>
    </entry>
    <entry name="entry2">
      <value>3</value>
    </entry>
  </entries>
</root>

This solution has the benefit that the order of the elements in the array is irrelevant.

CodePudding user response:

You just need to remove the .// from the preceding-sibling::.

Like this:

xmlstarlet edit --delete '/_:root/_:entries/*[@name = preceding-sibling::*/@name]' input.xml

Note: I tested this but needed to add a default namespace to the XML so that the _ namespace prefix would work. If you don't have a default namespace in your actual input, remove the _: from your xpath.

  • Related