Home > Software engineering >  How can I check for duplicate values in List of XElement
How can I check for duplicate values in List of XElement

Time:06-09

I am writing a few lines of code to read an XML file, get a collection of elements, add them to a list then check for duplicates. Should be simple but I can't get it working.

Here is the XML that I read (or an extract of for simplicity). Note the first and third entries are the same, these are what I want to identify:

<pmEntry>
      <dmRef>
        <dmRefIdent>
          <dmCode modelCode="CRAFT123" systemCode="B" anotherCode="63" infoCode="010" />
          <issueInfo issueNumber="001" inWork="00" />
        </dmRefIdent>
        <dmRefAddressItems>
          <dmTitle>
            <techName>My data</techName>
            <infoName>General data</infoName>
          </dmTitle>
        </dmRefAddressItems>
      </dmRef>
      <dmRef>
        <dmRefIdent>
          <dmCode modelCode="CRAFT789" systemCode="B" anotherCode="50" infoCode="500" />
          <issueInfo issueNumber="001" inWork="00" />
        </dmRefIdent>
        <dmRefAddressItems>
          <dmTitle>
            <techName>Some other data</techName>
            <infoName>Technical data</infoName>
          </dmTitle>
        </dmRefAddressItems>
      </dmRef>
      <dmRef>
        <dmRefIdent>
          <dmCode modelCode="CRAFT123" systemCode="B" anotherCode="63" infoCode="010" />
          <issueInfo issueNumber="001" inWork="00" />
        </dmRefIdent>
        <dmRefAddressItems>
          <dmTitle>
            <techName>My data</techName>
            <infoName>General data</infoName>
          </dmTitle>
        </dmRefAddressItems>
      </dmRef>
</pmEntry>

Here is the method, into which gets passed a file path of the XML file.

private void CheckPMforDuplicates(string path)
        {
            XDocument doc = XDocument.Load(path);

            List<XElement> DMList = new List<XElement>();

            var DMs = doc.Descendants("dmRefIdent");

            if (DMs != null) 
            {
                foreach (var dm in DMs)
                {
                    DMList.Add(dm);
                }

                var duplicates = DMList
                .GroupBy(i => i.Element("dmCode"))
                .Where(g => g.Elements("dmCode").Count() > 1)
                .Select(g => g.Key);

                if (duplicates != null)
                {
                    string duplicateDMstring = "";

                    foreach (var dup in duplicates)
                    {
                        duplicateDMstring = duplicateDMstring   ",\r\n "   dup;
                    }

                    if(duplicateDMstring == "")
                    {
                        MessageBox.Show("No duplicates");
                    }
                    else
                    {
                        MessageBox.Show("Duplicates are "   duplicateDMstring);
                    }
                }

             }
        }

If I change the Linq query to look for a count of equal to 1 (i.e "== 1") it presents me with a nice list of elements in the message box as expected. But for some reason it will not find duplicates.

It's clearly a Linq problem, but I can't get it working to display the two duplicate entries.

CodePudding user response:

It's not a LINQ problems, it's the Equality problem. You group by Element("dmCode"), which is a XElement - a reference type, so by default GroupBy will compare references. To actually compare the contents of an element, use i.Element("dmCode").Value or i.Element("dmCode").ToString() instead:

            var duplicates = DMList
            .GroupBy(i => i.Element("dmCode").ToString())
            .Where(g => g.Elements("dmCode").Count() > 1)
            .Select(g => g.Key);

or provide your own IEqualityComparer.

See also: How to use LINQ Group by on XElement attribute

  • Related