I have three columns in the table with the third one being the valid XML type. A sample of the XML is listed below which includes several tags. This XML is downloaded from the IRS forms 990. I am trying to get a handful of tags to return in a tabular format. I’ve done this query below and it runs but it doesn’t return any values. I’m probably doing something wrong with the path but I can’t seem to figure it out.
Here is my table structure:
CREATE TABLE dbo.XMLFilesTable
(
Id INT IDENTITY PRIMARY KEY,
FileName VARCHAR(100),
XMLData XML,
LoadedDateTime DATETIME
)
GO
Here is the xml from the xml column:
<?xml version="1.0" encoding="utf-8"?>
<Return xmlns="http://www.irs.gov/efile" returnVersion="2019v5.1">
<ReturnData documentCnt="7">
<IRS990 documentId="IRS990">
<PrincipalOfficerNm>Fr Francis Pizzarelli</PrincipalOfficerNm>
<USAddress>
<AddressLine1Txt>One High Street</AddressLine1Txt>
<CityNm>Port Jefferson</CityNm>
<StateAbbreviationCd>NY</StateAbbreviationCd>
<ZIPCd>11777</ZIPCd>
</USAddress>
<GrossReceiptsAmt>6463631</GrossReceiptsAmt>
<GroupReturnForAffiliatesInd>false</GroupReturnForAffiliatesInd>
<Organization501c3Ind>X</Organization501c3Ind>
<TypeOfOrganizationCorpInd>X</TypeOfOrganizationCorpInd>
<FormationYr>1980</FormationYr>
<LegalDomicileStateCd>NY</LegalDomicileStateCd>
<ActivityOrMissionDesc>To provide help to individuals who can not find help in the form of housing, counseling, and other support so that they can eventually live productive independent lives. Hope House provides hope, care and compassion to nearly 2,000 individuals in need each month.</ActivityOrMissionDesc>
<VotingMembersGoverningBodyCnt>8</VotingMembersGoverningBodyCnt>
<VotingMembersIndependentCnt>7</VotingMembersIndependentCnt>
<TotalEmployeeCnt>122</TotalEmployeeCnt>
<TotalGrossUBIAmt>0</TotalGrossUBIAmt>
<NetUnrelatedBusTxblIncmAmt>0</NetUnrelatedBusTxblIncmAmt>
<PYContributionsGrantsAmt>3307653</PYContributionsGrantsAmt>
<CYContributionsGrantsAmt>4963545</CYContributionsGrantsAmt>
<CYProgramServiceRevenueAmt>0</CYProgramServiceRevenueAmt>
<PYInvestmentIncomeAmt>25158</PYInvestmentIncomeAmt>
<CYInvestmentIncomeAmt>122678</CYInvestmentIncomeAmt>
<PYOtherRevenueAmt>1302778</PYOtherRevenueAmt>
<CYOtherRevenueAmt>1016131</CYOtherRevenueAmt>
<PYTotalRevenueAmt>4635589</PYTotalRevenueAmt>
<CYTotalRevenueAmt>6102354</CYTotalRevenueAmt>
<CYGrantsAndSimilarPaidAmt>0</CYGrantsAndSimilarPaidAmt>
<CYBenefitsPaidToMembersAmt>0</CYBenefitsPaidToMembersAmt>
<PYSalariesCompEmpBnftPaidAmt>3294184</PYSalariesCompEmpBnftPaidAmt>
<CYSalariesCompEmpBnftPaidAmt>3352675</CYSalariesCompEmpBnftPaidAmt>
<CYTotalProfFndrsngExpnsAmt>0</CYTotalProfFndrsngExpnsAmt>
<CYTotalFundraisingExpenseAmt>501828</CYTotalFundraisingExpenseAmt>
<PYOtherExpensesAmt>1793710</PYOtherExpensesAmt>
I tried the following code and it runs but returns nothing:
DECLARE @XML AS XML, @hDoc AS INT, @SQL NVARCHAR (MAX), @rootxmlns VARCHAR(100)
SELECT @XML = XMLData FROM XMLFilesTable
SET @rootxmlns = '<Return xmlns="http://www.irs.gov/efile"/'
EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML
SELECT PYContributionsGrantsAmt, CYContributionsGrantsAmt, PYInvestmentIncomeAmt
FROM OPENXML(@hDoc, 'Return/ReturnData')
WITH
(
PYContributionsGrantsAmt [BIGINT] 'PYContributionsGrantsAmt',
CYContributionsGrantsAmt [BIGINT] 'CYContributionsGrantsAmt',
PYInvestmentIncomeAmt [BIGINT] 'PYInvestmentIncomeAmt'
)
EXEC sp_xml_removedocument @hDoc
GO
My thoughts are that the path after the open xml is not right.
Any thoughts on this?
CodePudding user response:
There are several issues in the posted code...
SET @rootxmlns = '<Return xmlns="http://www.irs.gov/efile"/'
is not a self-closing tag and is also attempting to define a default namespace which is problematic with OPENXML.EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML
doesn't include the@rootxmlns
parameter, so the efile namespace isn't used for querying.OPENXML(@hDoc, 'Return/ReturnData')
doesn't include theIRS990
element in the XPath.
Correcting for these issues a result can be obtained with the following:
DECLARE @XML AS XML, @hDoc AS INT, @SQL NVARCHAR (MAX), @rootxmlns VARCHAR(100)
SELECT @XML = XMLData FROM XMLFilesTable
SET @rootxmlns = '<Return xmlns:example="http://www.irs.gov/efile"/>'
EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML, @rootxmlns
SELECT PYContributionsGrantsAmt, CYContributionsGrantsAmt, PYInvestmentIncomeAmt
FROM OPENXML(@hDoc, 'example:Return/example:ReturnData/example:IRS990')
WITH
(
PYContributionsGrantsAmt [BIGINT] 'example:PYContributionsGrantsAmt',
CYContributionsGrantsAmt [BIGINT] 'example:CYContributionsGrantsAmt',
PYInvestmentIncomeAmt [BIGINT] 'example:PYInvestmentIncomeAmt'
)
EXEC sp_xml_removedocument @hDoc
Which yields the result:
PYContributionsGrantsAmt | CYContributionsGrantsAmt | PYInvestmentIncomeAmt |
---|---|---|
3307653 | 4963545 | 25158 |
Compare the above SQL code with a nodes()
and values()
implementation which not only allows you to specify the default namespace using with xmlnamspaces
but can also be used in a set-based fashion to shred the XML for multiple rows in the XMLFilesTable
table...
with xmlnamespaces (default 'http://www.irs.gov/efile')
select
IRS990.value('(PYContributionsGrantsAmt/text())[1]', 'bigint') as PYContributionsGrantsAmt,
IRS990.value('(CYContributionsGrantsAmt/text())[1]', 'bigint') as CYContributionsGrantsAmt,
IRS990.value('(PYInvestmentIncomeAmt/text())[1]', 'bigint') as PYInvestmentIncomeAmt
from dbo.XMLFilesTable
cross apply XMLData.nodes('/Return/ReturnData/IRS990') ReturnData(IRS990);
CodePudding user response:
Microsoft proprietary OPENXML
and its companions sp_xml_preparedocument
and sp_xml_removedocument
are kept just for backward compatibility with the obsolete SQL Server 2000. Their use is diminished just to very few fringe cases.
Starting from SQL Server 2005 onwards, it is strongly recommended to re-write your SQL and switch it to XQuery.
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, XMLData XML);
INSERT INTO @tbl (XMLData) VALUES
('<?xml version="1.0" encoding="utf-8"?>
<Return xmlns="http://www.irs.gov/efile" returnVersion="2019v5.1">
<ReturnData documentCnt="7">
<IRS990 documentId="IRS990">
<PrincipalOfficerNm>Fr Francis Pizzarelli</PrincipalOfficerNm>
<USAddress>
<AddressLine1Txt>One High Street</AddressLine1Txt>
<CityNm>Port Jefferson</CityNm>
<StateAbbreviationCd>NY</StateAbbreviationCd>
<ZIPCd>11777</ZIPCd>
</USAddress>
<GrossReceiptsAmt>6463631</GrossReceiptsAmt>
<GroupReturnForAffiliatesInd>false</GroupReturnForAffiliatesInd>
<Organization501c3Ind>X</Organization501c3Ind>
<TypeOfOrganizationCorpInd>X</TypeOfOrganizationCorpInd>
<FormationYr>1980</FormationYr>
<LegalDomicileStateCd>NY</LegalDomicileStateCd>
<ActivityOrMissionDesc>To provide help to individuals who can not find help in the form of housing, counseling, and other support so that they can eventually live productive independent lives. Hope House provides hope, care and compassion to nearly 2,000 individuals in need each month.</ActivityOrMissionDesc>
<VotingMembersGoverningBodyCnt>8</VotingMembersGoverningBodyCnt>
<VotingMembersIndependentCnt>7</VotingMembersIndependentCnt>
<TotalEmployeeCnt>122</TotalEmployeeCnt>
<TotalGrossUBIAmt>0</TotalGrossUBIAmt>
<NetUnrelatedBusTxblIncmAmt>0</NetUnrelatedBusTxblIncmAmt>
<PYContributionsGrantsAmt>3307653</PYContributionsGrantsAmt>
<CYContributionsGrantsAmt>4963545</CYContributionsGrantsAmt>
<CYProgramServiceRevenueAmt>0</CYProgramServiceRevenueAmt>
<PYInvestmentIncomeAmt>25158</PYInvestmentIncomeAmt>
<CYInvestmentIncomeAmt>122678</CYInvestmentIncomeAmt>
<PYOtherRevenueAmt>1302778</PYOtherRevenueAmt>
<CYOtherRevenueAmt>1016131</CYOtherRevenueAmt>
<PYTotalRevenueAmt>4635589</PYTotalRevenueAmt>
<CYTotalRevenueAmt>6102354</CYTotalRevenueAmt>
<CYGrantsAndSimilarPaidAmt>0</CYGrantsAndSimilarPaidAmt>
<CYBenefitsPaidToMembersAmt>0</CYBenefitsPaidToMembersAmt>
<PYSalariesCompEmpBnftPaidAmt>3294184</PYSalariesCompEmpBnftPaidAmt>
<CYSalariesCompEmpBnftPaidAmt>3352675</CYSalariesCompEmpBnftPaidAmt>
<CYTotalProfFndrsngExpnsAmt>0</CYTotalProfFndrsngExpnsAmt>
<CYTotalFundraisingExpenseAmt>501828</CYTotalFundraisingExpenseAmt>
<PYOtherExpensesAmt>1793710</PYOtherExpensesAmt>
</IRS990>
</ReturnData>
</Return>');
WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT ID
, PYContributionsGrantsAmt = c.value('(PYContributionsGrantsAmt/text())[1]', 'BIGINT')
, CYContributionsGrantsAmt = c.value('(CYContributionsGrantsAmt/text())[1]', 'BIGINT')
, PYInvestmentIncomeAmt = c.value('(PYInvestmentIncomeAmt /text())[1]', 'BIGINT')
FROM @tbl
CROSS APPLY XMLData.nodes('/Return/ReturnData/IRS990') AS t(c);