I am trying to delete duplicate entries in an xml
file based on the value of an attribute.
<?xml version='1.0' encoding='UTF-8'?>
<root>
<entries>
<entry name="entry1">
<value>1</value>
</entry>
<entry name="entry1"> <-- Duplicate name here
<value>2</value>
</entry>
<entry name="entry2">
<value>3</value>
</entry>
</entries>
</root>
And I want the following
<?xml version='1.0' encoding='UTF-8'?>
<root>
<entries>
<entry name="entry1">
<value>1</value>
</entry>
<entry name="entry2">
<value>3</value>
</entry>
</entries>
</root>
I have tried
xmlstarlet edit --delete '/_:root/_:entries/*[@name = .//preceding-sibling::*/@name]'
But the xpath
does not match the previous entry with the attribute name="entry1"
CodePudding user response:
If you can use the YAML processor mikefarah/yq, your task can be solved in this way:
Version < 4.30
yq --input-format xml --output-format xml e '.root.entries.entry |= unique_by(. name)'
Version >= 4.30
yq --input-format xml --output-format xml e '.root.entries.entry |= unique_by(. @name)'
Output
<?xml version='1.0' encoding='UTF-8'?>
<root>
<entries>
<entry name="entry1">
<value>1</value>
</entry>
<entry name="entry2">
<value>3</value>
</entry>
</entries>
</root>
This solution has the benefit that the order of the elements in the array is irrelevant.
CodePudding user response:
You just need to remove the .//
from the preceding-sibling::
.
Like this:
xmlstarlet edit --delete '/_:root/_:entries/*[@name = preceding-sibling::*/@name]' input.xml
Note: I tested this but needed to add a default namespace to the XML so that the _
namespace prefix would work. If you don't have a default namespace in your actual input, remove the _:
from your xpath.