I have the following problem. I need to write an application in C# that will read a given XML and prepare data for me to load into the database. In XML, the structure of which I have no influence, the main data is placed in CDATA. I have verified that the structure of this data is ordered in the correct XML structure.
I've searched hundreds of posts and can't find any solution from them. Below is the XML file from which I need to extract the data from the CDATA section. Maybe one of you can help me?
<Docs>
<Doc>
<Content>
<![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="1" Description="Desc1"></Header>
<Poss>
<Pos Id="1" Name="Pos1"></Pos>
<Pos Id="2" Name="Pos2"></Pos>
</Poss>
</Doc>
]]>
</Content>
</Doc>
<Doc>
<Content>
<![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="2" Description="Desc2"></Header>
<Poss>
<Pos Id="3" Name="Pos3"></Pos>
<Pos Id="4" Name="Pos4"></Pos>
</Poss>
</Doc>
]]>
</Content>
</Doc>
</Docs>
For me, the most important are the fields contained in the Content section and I have to load them as data into the database.
CodePudding user response:
Here is implementation based on a stored procedure with the XML parameter like in my comments.
I had to remove the <Poss>
XML element to make CData section a well-formed XML.
SQL
DECLARE @xml XML =
N'<Docs>
<Doc>
<Content><![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="1" Description="Desc1"></Header>
<Pos Id="1" Name="Pos1"></Pos>
<Pos Id="2" Name="Pos2"></Pos>
</Doc>
]]>
</Content>
</Doc>
<Doc>
<Content><![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="2" Description="Desc2"></Header>
<Pos Id="3" Name="Pos3"></Pos>
<Pos Id="4" Name="Pos4"></Pos>
</Doc>
]]>
</Content>
</Doc>
</Docs>';
--INSERT INTO <targetTable>
SELECT h.value('(Header/@DocNumber)[1]', 'INT') AS DocNumber
, h.value('(Header/@Description)[1]', 'VARCHAR(256)') AS DocDescription
, d.value('@Id', 'INT') AS posID
, d.value('@Name', 'VARCHAR(256)') AS posName
FROM @xml.nodes('/Docs/Doc/Content/text()') AS t(c)
CROSS APPLY (SELECT TRY_CAST(c.query('.').value('.', 'NVARCHAR(MAX)') AS XML)) AS t1(x)
CROSS APPLY x.nodes('/Doc') AS t2(h)
CROSS APPLY h.nodes('Pos') AS t3(d);
Output
DocNumber | DocDescription | posID | posName |
---|---|---|---|
2 | Desc2 | 3 | Pos3 |
2 | Desc2 | 4 | Pos4 |
1 | Desc1 | 1 | Pos1 |
1 | Desc1 | 2 | Pos2 |
CodePudding user response:
To extract the data from the CData part,
- Construct the classes.
public class Doc
{
public Header Header { get; set; }
[XmlArrayItem(typeof(Pos), ElementName = "Pos")]
public List<Pos> Poss { get; set; }
}
public class Header
{
[XmlAttribute]
public int DocNumber { get; set; }
[XmlAttribute]
public string Description { get; set; }
}
public class Pos
{
[XmlAttribute]
public int Id { get; set; }
[XmlAttribute]
public string Name { get; set; }
}
Implement the extraction logic.
2.1. Read the XML string as
XDocument
viaXDocument.Parse()
.2.2. Select the
DescendantNodes
for the XPath of "/Docs/Doc/Content".2.3. Convert the nodes to
XCData
type.2.4. With
XmlSerializer
to deserialize the value ofXCData
toDoc
type.
using System.Linq;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
using System.Xml.Serialization;
using System.IO;
XmlSerializer xmlSerializer = new XmlSerializer(typeof(Doc));
XDocument xDoc = XDocument.Parse(xml);
var cdataSections = xDoc.XPathSelectElements("/Docs/Doc/Content")
.DescendantNodes()
.OfType<XCData>()
.Select(x => (Doc)xmlSerializer.Deserialize(new StringReader(x.Value)))
.ToList();