Home > Enterprise >  Parsing nested complex XML to CSV using powershell
Parsing nested complex XML to CSV using powershell

Time:12-15

Good day,

I am new to powershell and i am trying to parse a complex xml to CSV:

Here is the xml code

<LoyaltyCustomer Action="E">
    <Retailer Id="1">
<HouseHold BuyingUnitInternalKey="2" HouseHoldExternalId="-1" SendEmail="false">
            <Members>
                <Member MemberInternalKey="2" MemberExternalId="-1" IsMainMember="true" LastName="Internal" FirstName="Use" StartDate="2012-10-02T12:42:00" RedemptionPrivileges="0" MemberStatus="0" AdressNormalizationUpdate="N">
                    <Cards>
                        <Card Id="-1" CardStatus="" IssueDate="2012-10-02T12:42:00" ExpirationDate="2056-12-31T00:00:00" />
                    </Cards>
                    <Stores>
                        <Store Id="26" StoreTypeId="1" IsHomeStore="true" />
                    </Stores>
                </Member>
            </Members>
        </HouseHold>
        <HouseHold BuyingUnitInternalKey="3" HouseHoldExternalId="244003000001" Country="11" State="223" City="Calgary" Street1="Main St" StreetNum="203" PostalCode="R4C 3R1" POBox="999999" HomePhone="438-439-1246" EMailAddress="[email protected]" SendEmail="true">
            <Accounts>
                <Account Id="1"  EarnValue="6422.0000"   RedeemValue="6049.0000" Balance="373.0000" LastUpdate="2013-05-10" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
                <Account Id="3"  EarnValue="10.0000"     RedeemValue="8.0000" Balance="2.0000" LastUpdate="2013-02-20" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
                <Account Id="4"  EarnValue="10.0000"     RedeemValue="7.0000" Balance="3.0000" LastUpdate="2013-02-20" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
                <Account Id="5"  EarnValue="8.0000"      RedeemValue="5.0000" Balance="3.0000" LastUpdate="2013-04-18" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
                <Account Id="6"  EarnValue="9.0000"      RedeemValue="8.0000" Balance="1.0000" LastUpdate="2013-02-20" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
                <Account Id="7"  EarnValue="7028.0000"   RedeemValue="6500.0000" Balance="528.0000" LastUpdate="2017-07-10" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
                <Account Id="8"  EarnValue="269319.0000" RedeemValue="269000.0000" Balance="319.0000" LastUpdate="2019-07-10" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
                <Account Id="10" EarnValue="688968.0000" RedeemValue="682641.0000" Balance="6327.0000" LastUpdate="2019-07-10" HousekeepingBalance="0.0000" HousekeepingTotalAccumulated="0.0000" HousekeepingTotalRedeemed="0.0000" />
            </Accounts>
            <HouseHoldSegments>
                <Segment Id="100" Status="1" AttachmentSourceId="1" />
                <Segment Id="500" Status="1" AttachmentSourceId="4" />
                <Segment Id="502" Status="1" AttachmentSourceId="8" />
                <Segment Id="531" Status="1" AttachmentSourceId="1" />
            </HouseHoldSegments>
            <Members>
                <Member MemberInternalKey="3" MemberExternalId="244003000001" IsMainMember="true" LastName="zabana" FirstName="Mike" BirthDate="1970-11-04" DriversLicense="drvlic" NationalInsuranceNumber="socsec" Remarks="Test Account" MobilePhoneNumber="438-439-1246" Gender="1" Title="1" StartDate="2013-01-21T14:28:00" EffectiveDate="2013-02-08T13:58:00" RedemptionPrivileges="0" LanguageId="0" NumberOfFamilyMembers="4" Anonimity="0" MemberStatus="1" ReceiptLayoutId="1" AdressNormalizationUpdate="N" UpdatedDate="2019-07-20T14:47:00" CommercialDriversLicense="comdrvlic">
                    <Cards>
                        <Card Id="244003000001" CardStatus="1" IssueDate="2013-01-21T00:00:00" ExpirationDate="2056-12-31T23:59:00" />
                    </Cards>
                    <Stores>
                        <Store Id="24" StoreTypeId="1" IsHomeStore="true" />
                    </Stores>
                    <MemberAttributes>
                        <Attribute Id="10004" Value="zabana MANAGEMENTS" />
                    </MemberAttributes>
                    <MemberAdditionalAddress />
                    <FamilyMembers>
                        <FamilyMember FamilyMemberId="1" TypeId="2" Name="Kathy" />
                        <FamilyMember FamilyMemberId="2" TypeId="1" Name="Melissa" BirthdayDate="1997-06-06" Gender="2" />
                        <FamilyMember FamilyMemberId="3" TypeId="1" Name="Trent" BirthdayDate="2000-08-29" Gender="1" />
                    </FamilyMembers>
                </Member>
                <Member MemberInternalKey="23612" MemberExternalId="244003000002" IsMainMember="false" LastName="zabana" FirstName="Kathy" BirthDate="1970-07-02" Remarks="Test Account for Family testing" Gender="2" Title="3" StartDate="2013-03-11T13:16:00" RedemptionPrivileges="0" LanguageId="0" PostOption="1" NumberOfFamilyMembers="1" Anonimity="0" MemberStatus="1" AdressNormalizationUpdate="N" UpdatedDate="2019-07-15T19:44:00">
                </Member>
                <Member MemberInternalKey="33421" MemberExternalId="244003000003" IsMainMember="false" LastName="zabana" FirstName="Trent" BirthDate="2000-08-29" Remarks="Test account to see how a different address and email affect the member export" Gender="1" Title="1" StartDate="2013-03-27T14:41:00" EffectiveDate="2017-06-26T13:55:00" RedemptionPrivileges="3" LanguageId="0" PostOption="1" NumberOfFamilyMembers="1" Anonimity="0" MemberStatus="1" AdressNormalizationUpdate="N" UpdatedDate="2019-07-05T18:07:00">
                </Member>
            </Members>
        </HouseHold>
    </Retailer>
</LoyaltyCustomer>

What i am trying to get is this output:

HouseHoldExternalId     IsMainMember    MemberExternalId    FirstName   LastName        StartDate           Id      Balance
-------------------     ------------    ----------------    ---------   --------        --------------      --      -------
-1                      TRUE            -1                  Use         Internal        10/2/2012 12:42     
244003000001            TRUE            244003000001        Mike        Zabana          1/21/2013 14:28     
244003000001            FALSE           244003000002        Kathy       zabana          3/11/2013 13:16     
244003000001            FALSE           244003000003        Trent       zabana          3/27/2013 14:41     
244003000001                                                                                                1       373
244003000001                                                                                                3       2
244003000001                                                                                                4       3
244003000001                                                                                                5       3
244003000001                                                                                                6       1
244003000001                                                                                                7       528
244003000001                                                                                                8       319
244003000001                                                                                                10      6327

I am using this script below in PowerShell:

$data = New-Object xml;
$data.load("C:\temp\parsing\mehdi.xml")

$mehdi = [pscustomobject]@{
"HouseHoldExternalId" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold").HouseHoldExternalId |Out-String
"MemberExternalId" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Members/Member").MemberExternalId |Out-String
"IsMainMember" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Members/Member").IsMainMember |Out-String
"LastName" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Members/Member").LastName |Out-String
"FirstName" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Members/Member").FirstName |Out-String
"StartDate" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Members/Member").StartDate |Out-String
"Id" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Accounts/Account").Id |Out-String
"Balance" = $data.SelectNodes("/LoyaltyCustomer/Retailer/HouseHold/Accounts/Account").Balance |Out-String }
$mehdi | ConvertTo-Csv -NoTypeInformation -Delimiter ";" | Set-Content -Path C:\temp\parsing\test2.csv -Encoding UTF8

and what i am getting is all data are under HouseholdExternalID column.

How can i get the same output? I will appreciate your assistance and advices. Thank you

CodePudding user response:

Seems like this should work to convert your XML into an object[] which you can export to CSV after. Basically, your expected output is a combination of 3 arrays:

  • $xml.LoyaltyCustomer.Retailer.Household to get the HouseHoldExternalId

  • $xml.LoyaltyCustomer.Retailer.Household.Members.Member to get IsMainMember, MemberExternalId, FirstName, LastName and StartDate

  • $xml.LoyaltyCustomer.Retailer.Household.Accounts.Account to get Id and Balance

$xml = [xml]::new()
$xml.Load("C:\temp\parsing\mehdi.xml")

$psobjectOut = {
    param($element, $subelement)

    [pscustomobject]@{
        HouseHoldExternalId = $element.HouseHoldExternalId
        IsMainMember = $subelement.IsMainMember
        MemberExternalId = $subelement.MemberExternalId
        FirstName = $subelement.FirstName
        LastName = $subelement.LastName
        StartDate = '{0}' -f $subelement.StartDate -as [datetime]
        Id = $subelement.Id
        Balance = $subelement.Balance
    }
}

$result = foreach($element in $xml.LoyaltyCustomer.Retailer.Household)
{
    foreach($subelement in $element.Members.Member)
    {
        & $psobjectOut -element $element -subelement $subelement
    }

    foreach($subelement in $element.Accounts.Account)
    {
        & $psobjectOut -element $element -subelement $subelement
    }
}

Inspecting at $result:

PS /> $result | Format-Table


HouseHoldExternalId IsMainMember MemberExternalId FirstName LastName StartDate             Id Balance
------------------- ------------ ---------------- --------- -------- ---------             -- -------
-1                  true         -1               Use       Internal 10/2/2012 12:42:00 PM    
244003000001        true         244003000001     Mike      zabana   1/21/2013 2:28:00 PM     
244003000001        false        244003000002     Kathy     zabana   3/11/2013 1:16:00 PM     
244003000001        false        244003000003     Trent     zabana   3/27/2013 2:41:00 PM     
244003000001                                                                               1  373.0000
244003000001                                                                               3  2.0000
244003000001                                                                               4  3.0000
244003000001                                                                               5  3.0000
244003000001                                                                               6  1.0000
244003000001                                                                               7  528.0000
244003000001                                                                               8  319.0000
244003000001                                                                               10 6327.0000

CodePudding user response:

Adding to Santiago's fine answer I wanted to attack this using xPath. So, I built off his work using an and came up with the below adjustment t

[XML]$XML = [XML]( Get-Content 'C:\Scripts\Example.xml' )

$HouseHolds = Select-Xml -Xml $Xml -XPath '/LoyaltyCustomer/Retailer/HouseHold'

$Output =
For($i = 0; $i -lt $HouseHolds.Count;   $i ) {
    $HouseHoldID    = $HouseHolds[$i].Node.HouseHoldExternalId
    $HouseHoldXpath = "/LoyaltyCustomer/Retailer/HouseHold[$($i 1)]"

    Select-Xml -Xml $XML -XPath "$HouseHoldXpath/Members/Member|$HouseHoldXpath/Accounts/Account" |    
    ForEach-Object{
        [PSCustomObject]@{
            HouseHoldExternalId = $HouseHoldID
            IsMainMember        = $_.Node.IsMainMember
            MemberExternalId    = $_.Node.MemberExternalId
            FirstName           = $_.Node.FirstName
            LastName            = $_.Node.LastName
            StartDate           = $_.Node.StartDate -as [DateTime]
            Id                  = $_.Node.Id
            Balance             = $_.Node.Balance
        }
    }
    
}

$Output | Format-Table -AutoSize

Returns:

HouseHoldExternalId IsMainMember MemberExternalId FirstName LastName StartDate             Id Balance
------------------- ------------ ---------------- --------- -------- ---------             -- -------  
-1                  true         -1               Use       Internal 10/2/2012 12:42:00 PM
244003000001                                                                               1  373.0000
244003000001                                                                               3  2.0000
244003000001                                                                               4  3.0000
244003000001                                                                               5  3.0000
244003000001                                                                               6  1.0000
244003000001                                                                               7  528.0000
244003000001                                                                               8  319.0000
244003000001                                                                               10 6327.0000
244003000001        true         244003000001     Mike      zabana   1/21/2013 2:28:00 PM
244003000001        false        244003000002     Kathy     zabana   3/11/2013 1:16:00 PM
244003000001        false        244003000003     Trent     zabana   3/27/2013 2:41:00 PM

Because of the combined xPath statement only 1 inner loop is required. Therefore, prepackaging an expression to output the PSCustomObject isn't really needed. That said, the ordering is a little off, presumably due to the manner in which the xPath query is run. All the results are there though.

An earlier revision that didn't use xPath quite as aggressively did have the correct order:

[XML]$XML = [XML]( Get-Content 'C:\Scripts\Example.xml' )

$HouseHolds = Select-Xml -Xml $Xml -XPath '/LoyaltyCustomer/Retailer/HouseHold'

$ObjectExpression = {
    [PSCustomObject]@{
        HouseHoldExternalId = $HouseHoldID
        IsMainMember        = $_.Node.IsMainMember
        MemberExternalId    = $_.Node.MemberExternalId
        FirstName           = $_.Node.FirstName
        LastName            = $_.Node.LastName
        StartDate           = $_.Node.StartDate -as [DateTime]
        Id                  = $_.Node.Id
        Balance             = $_.Node.Balance
    }
}

$Output =
For($i = 0; $i -lt $HouseHolds.Count;   $i ) {
    $HouseHoldID    = $HouseHolds[$i].Node.HouseHoldExternalId
    $HouseHoldXpath = "{0}{1}" -f $HouseHolds[$i].Pattern, "[$($i 1)]"
    
    Select-Xml -Xml $XML -XPath "$HouseHoldXpath/Members/Member" |
    ForEach-Object{ & $ObjectExpression }
    
    Select-Xml -Xml $XML -XPath "$HouseHoldXpath/Accounts/Account" |
    ForEach-Object{ & $ObjectExpression }
} 

$Output | Format-Table -AutoSize
  • Related