Home > Enterprise >  Parse XML - Retrieve the Portion Between the Double Quotes
Parse XML - Retrieve the Portion Between the Double Quotes

Time:03-10

I have the following XML that is in an XML column in SQL Server. I am able to retrieve the data between the tags and list it in table format using the code at the bottom. I can retrieve the values between all the tags except for the one I have in bold below that is in double quotes. I can get the value X just fine but I need to get the 6 that is in between the double quotes in this part: <Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>

WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT ID, FilingYear, FilingPeriod, FilingType, [FileName]

, Organization501c3Ind = c.value('(//Organization501c3Ind/text())[1]','varchar(MAX)')
, Organization501cInd = c.value('(//Organization501cInd/text())[1]','varchar(MAX)')
, Organization501cTypeTxt = c.value('(//Organization501cTypeTxt/text())[1]','varchar(MAX)')


FROM Form990
    CROSS APPLY XMLData.nodes('//Return') AS t(c)
    CROSS APPLY XMLData.nodes('//Return/ReturnHeader/Filer') AS t2(c2)

XML:

<ReturnData documentCnt="2">
    <IRS990 documentId="IRS990-01" referenceDocumentId="IRS990ScheduleO-01" referenceDocumentName="IRS990ScheduleO ReasonableCauseExplanation" softwareId="19009670">
      <PrincipalOfficerNm>CAREY BAKER</PrincipalOfficerNm>
      <USAddress>
        <AddressLine1Txt>PO BOX 11275</AddressLine1Txt>
        <CityNm>TALLAHASSEE</CityNm>
        <StateAbbreviationCd>FL</StateAbbreviationCd>
        <ZIPCd>32302</ZIPCd>
      </USAddress>
      <GrossReceiptsAmt>104241</GrossReceiptsAmt>
      <GroupReturnForAffiliatesInd>false</GroupReturnForAffiliatesInd>
      <Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>

Thoughts?

CodePudding user response:

Without a minimal reproducible example by the OP, shooting from the hip.

SQL

-- DDL and sample data population, start
DECLARE @Form990 TABLE (ID INT IDENTITY PRIMARY KEY, XMLData XML);
INSERT INTO @Form990(XMLData) VALUES
(N'<Return xmlns="http://www.irs.gov/efile" returnVersion="2019v5.1">
    <ReturnData documentCnt="2">
        <IRS990 documentId="IRS990-01" referenceDocumentId="IRS990ScheduleO-01" referenceDocumentName="IRS990ScheduleO ReasonableCauseExplanation" softwareId="19009670">
            <PrincipalOfficerNm>CAREY BAKER</PrincipalOfficerNm>
            <USAddress>
                <AddressLine1Txt>PO BOX 11275</AddressLine1Txt>
                <CityNm>TALLAHASSEE</CityNm>
                <StateAbbreviationCd>FL</StateAbbreviationCd>
                <ZIPCd>32302</ZIPCd>
            </USAddress>
            <GrossReceiptsAmt>104241</GrossReceiptsAmt>
            <GroupReturnForAffiliatesInd>false</GroupReturnForAffiliatesInd>
            <Organization501c3Ind>X</Organization501c3Ind>
            <Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>
        </IRS990>
    </ReturnData>
</Return>');
-- DDL and sample data population, end

WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT -- ID, FilingYear, FilingPeriod, FilingType, [FileName]
    Organization501c3Ind = c.value('(Organization501c3Ind/text())[1]','varchar(MAX)')
    , Organization501cInd = c.value('(Organization501cInd/text())[1]','varchar(MAX)')
    , Organization501cTypeTxt = c.value('(Organization501cInd/@organization501cTypeTxt)[1]','varchar(MAX)')
FROM @Form990
    CROSS APPLY XMLData.nodes('/Return/ReturnData/IRS990') AS t(c)

Output

 ---------------------- --------------------- ------------------------- 
| Organization501c3Ind | Organization501cInd | Organization501cTypeTxt |
 ---------------------- --------------------- ------------------------- 
| X                    | X                   |                       6 |
 ---------------------- --------------------- ------------------------- 

CodePudding user response:

NOTE: XML element and attribute names are case-sensitive. i.e.: Organization501cTypeTxt will not match an attribute named organization501cTypeTxt.

When extracting attributes you need to use the @ accessor in your XPath query. Try something like the following...

WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT  ID, FilingYear, FilingPeriod, FilingType, [FileName],
    Organization501cInd = c2.value('(Organization501cInd/text())[1]','varchar(MAX)'),
    organization501cTypeTxt = c2.value('(Organization501cInd/@organization501cTypeTxt)[1]','varchar(MAX)')
FROM Form990
    CROSS APPLY XMLData.nodes('/ReturnData') AS t(c)
    CROSS APPLY t.c.nodes('IRS990') AS t2(c2);
  • Related