Home > Back-end >  Extract XML elements to tabular format using Powershell
Extract XML elements to tabular format using Powershell

Time:03-06

I have the following XML document (actually a TWB file -- Tableau) and I'm looking to extract the following elements into a tabular format.

The overall structure of the file is XML structure

The required fields are highlighted Fields needed for each "worksheet"

I need to populate as the following Tabular format

I was able to extract the worksheet name, but failing to proceed further (new to Powershell) Current Progress

Sample XML:

<?xml version='1.0' encoding='utf-8'?>
<!-- build 20181.18.1008.2110                               -->
<workbook original-version="18.1" source-build="2018.1.6 (20181.18.1008.2110)" source-platform="win" version="18.1" xml:base="https://tableau" xmlns:user="http://www.tableausoftware.com/xml/user">
  <repository-location id="ApplicationInfo" path="/t/DEV/workbooks" revision="1.4" site="DEV" />
  <datasources>
    <datasource caption="App Info" inline="true" name="sqlproxy.1xdgpjx1g0jmum1f5c8nx1eniql9" version="18.1">
    </datasource>
  </datasources>
  <worksheets>
    <worksheet name="App Info ">
      <table>
        <view>
          <datasources>
            <datasource caption="App Info" name="sqlproxy.1xdgpjx1g0jmum1f5c8nx1eniql9" />
          </datasources>
          <datasource-dependencies datasource="sqlproxy.1xdgpjx1g0jmum1f5c8nx1eniql9">
            <column aggregation="Count" caption="App Info Grp" datatype="string" default-type="nominal" layered="true" name="[App_Info_Grp]" pivot="key" role="dimension" type="nominal" user-datatype="string" visual-totals="Default" />
            <column aggregation="Count" caption="Primary ID" datatype="string" default-type="nominal" layered="true" name="[Primary_ID]" pivot="key" role="dimension" type="nominal" user-datatype="string" visual-totals="Default" />
            <column-instance column="[App_Info_Grp]" derivation="None" name="[none:App_Info_Grp:nk]" pivot="key" type="nominal" />
            <column-instance column="[Primary_ID]" derivation="None" name="[none:Primary_ID:nk]" pivot="key" type="nominal" />
          </datasource-dependencies>
        </view>
      </table>
    </worksheet>
    <worksheet name="App Info by PrimaryID">
      <table>
        <view>
          <datasources>
            <datasource caption="App Info" name="sqlproxy.1xdgpjx1g0jmum1f5c8nx1eniql9" />
          </datasources>
          <datasource-dependencies datasource="sqlproxy.1xdgpjx1g0jmum1f5c8nx1eniql9">
            <column aggregation="Count" caption="App Info Type" datatype="string" default-type="nominal" layered="true" name="[App_Info_Type]" pivot="key" role="dimension" type="nominal" user-datatype="string" visual-totals="Default" />
            <column aggregation="Count" datatype="string" default-type="nominal" layered="true" name="[EIN]" pivot="key" role="dimension" type="nominal" user-datatype="string" visual-totals="Default" />
            <column-instance column="[App_Info_Type]" derivation="None" name="[none:App_Info_Type:nk]" pivot="key" type="nominal" />
            <column-instance column="[EIN]" derivation="None" name="[none:EIN:nk]" pivot="key" type="nominal" />
          </datasource-dependencies>
        </view>
      </table>
    </worksheet>
  </worksheets>
</workbook>

Any help is appreciated!

CodePudding user response:

Not sure this is exactly what you need but it might give you a hint on how to approach the logic to parse your XML.

$result = foreach($file in Get-ChildItem *.xml) {
    [xml]$xml = Get-Content $file
    foreach($ws in $xml.workbook.worksheets.worksheet) {
        $wsname = $ws.name
        foreach($tb in $ws.table) {
            $tbname = $tb.view.datasources.datasource.caption
            foreach($col in $tb.view.'datasource-dependencies'.'column-instance'.column) {
                [pscustomobject]@{
                    FileName                 = $file.Name
                    'DATASOURCE_CAPTION'     = $tbname
                    'WORKSHEET NAME'         = $wsname
                    'COLUMN-INSTANCE_COLUMN' = $col
                }
            }
        }
    }
}

$result | Format-Table

I stored your XML as example.xml (FileName property), this results in the following:

FileName    DATASOURCE_CAPTION WORKSHEET NAME        COLUMN-INSTANCE_COLUMN
--------    ------------------ --------------        ----------------------
example.xml App Info           App Info              [App_Info_Grp]
example.xml App Info           App Info              [Primary_ID]
example.xml App Info           App Info by PrimaryID [App_Info_Type]
example.xml App Info           App Info by PrimaryID [EIN]
  • Related