I have a large dataset that contains duplicates records. I've been trying to use XQuery to return results minus the duplicates. I have been reading around and thought fn:duplicate-values would allow me to get what I want...and it still might but I haven't quite achieved what I want yet.
---Sample xml data---
<data>
<Details Level="2">
<Record SectionNumber="0">
<Field Name="GroupNameLINKID1" FieldName="GroupName ({VW_PEOPLE.LINK_ID})">
<FormattedValue>76.00</FormattedValue>
<Value>76.00</Value>
</Field>
<Field Name="BRIEFBIO1" FieldName="{VW_PEOPLE.BRIEF_BIO}">
<FormattedValue>1833 - 1883</FormattedValue>
<Value>1833 - 1883</Value>
</Field>
<Field Name="BIRTHDATE1" FieldName="{VW_PEOPLE.BIRTH_DATE}">
<FormattedValue>1833</FormattedValue>
<Value>1833</Value>
</Field>
<Field Name="DEATHDATE1" FieldName="{VW_PEOPLE.DEATH_DATE}">
<FormattedValue>1883</FormattedValue>
<Value>1883</Value>
</Field>
</Record>
</Details>
<Details Level="2">
<Record SectionNumber="0">
<Field Name="GroupNameLINKID1" FieldName="GroupName ({VW_PEOPLE.LINK_ID})">
<FormattedValue>76.00</FormattedValue>
<Value>76.00</Value>
</Field>
<Field Name="BRIEFBIO1" FieldName="{VW_PEOPLE.BRIEF_BIO}">
<FormattedValue>1833 - 1883</FormattedValue>
<Value>1833 - 1883</Value>
</Field>
<Field Name="BIRTHDATE1" FieldName="{VW_PEOPLE.BIRTH_DATE}">
<FormattedValue>1833</FormattedValue>
<Value>1833</Value>
</Field>
<Field Name="DEATHDATE1" FieldName="{VW_PEOPLE.DEATH_DATE}">
<FormattedValue>1883</FormattedValue>
<Value>1883</Value>
</Field>
</Record>
</Details>
<Details Level="2">
<Record SectionNumber="0">
<Field Name="GroupNameLINKID1" FieldName="GroupName ({VW_PEOPLE.LINK_ID})">
<FormattedValue>3,383.00</FormattedValue>
<Value>3,383.00</Value>
</Field>
<Field Name="BRIEFBIO1" FieldName="{VW_PEOPLE.BRIEF_BIO}">
<FormattedValue>1819 - 1894</FormattedValue>
<Value>1819 - 1894</Value>
</Field>
<Field Name="BIRTHDATE1" FieldName="{VW_PEOPLE.BIRTH_DATE}">
<FormattedValue>1819</FormattedValue>
<Value>1819</Value>
</Field>
<Field Name="DEATHDATE1" FieldName="{VW_PEOPLE.DEATH_DATE}">
<FormattedValue>1894</FormattedValue>
<Value>1894</Value>
</Field>
</Record>
</Details>
</data>
You will see in each record a 'GroupNameLINKID1'. The first 2 records are the same. What I would like is to remove the duplicate record. Here is my XQuery:
<report>
{
let $item := doc("People_list_test_edited.xml")/data/Details
let $id := $item/Record/Field[@Name='GroupNameLINKID1']/FormattedValue
for $unique-id in distinct-values($id)
let $briefbio := $item/Record/Field[@Name='BRIEFBIO1']/FormattedValue
let $birth := $item/Record/Field[@Name='BIRTHDATE1']/FormattedValue
let $death := $item/Record/Field[@Name='DEATHDATE1']/FormattedValue
return
<object><id>{$unique-id}</id></object>
}
</report>
My current output looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<report>
<object>
<id>76.00</id>
</object>
<object>
<id>3,383.00</id>
</object>
</report>
What I want is:
<?xml version="1.0" encoding="UTF-8"?>
<report>
<object>
<id>76.00</id>
<briefbio>1833 - 1883</briefbio>
<birth>1833</birth>
<death>1883</death>
</object>
<object>
<id>3,383.00</id>
<briefbio>1819 - 1894</briefbio>
<birth>1819</birth>
<death>1894</death>
</object>
</report>
Whenever I have tried adding additional 'where' or 'for' statements I either get returned results with all the duplicate records included or I get a huge list of repeated briefbio's, birth, and death dates under each $unique-id.
Can anyone steer me in the right direction? Many thanks in advance! All the best, Rob
CodePudding user response:
Here is how to do with with XQuery 3 and grouping:
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'xml';
declare option output:indent 'yes';
<report>
{
for $detail in data/Details
group by $gn := $detail/Record/Field[@Name='GroupNameLINKID1']/FormattedValue
return
<object>
<id>{$gn}</id>
<briefbio>{$detail[1]/Record/Field[@Name='BRIEFBIO1']/FormattedValue/data()}</briefbio>
</object>
}
</report>
I have not spelled out all fields but from briefbio
it should be clear how to fill in the other.