I have the following XML document (actually a TWB file -- Tableau) and I'm looking to extract the following elements into a tabular format.
The overall structure of the file is
The required fields are highlighted
I need to populate as the following
I was able to extract the worksheet name, but failing to proceed further (new to Powershell)
Sample XML:
<?xml version='1.0' encoding='utf-8'?>
<!-- build 20181.18.1008.2110 -->
<workbook original-version="18.1" source-build="2018.1.6 (20181.18.1008.2110)" source-platform="win" version="18.1" xml:base="https://tableau" xmlns:user="http://www.tableausoftware.com/xml/user">
<repository-location id="ApplicationInfo" path="/t/DEV/workbooks" revision="1.4" site="DEV" />
<datasources>
<datasource caption="App Info" inline="true" name="sqlproxy.1xdgpjx1g0jmum1f5c8nx1eniql9" version="18.1">
</datasource>
</datasources>
<worksheets>
<worksheet name="App Info ">
<table>
<view>
<datasources>
<datasource caption="App Info" name="sqlproxy.1xdgpjx1g0jmum1f5c8nx1eniql9" />
</datasources>
<datasource-dependencies datasource="sqlproxy.1xdgpjx1g0jmum1f5c8nx1eniql9">
<column aggregation="Count" caption="App Info Grp" datatype="string" default-type="nominal" layered="true" name="[App_Info_Grp]" pivot="key" role="dimension" type="nominal" user-datatype="string" visual-totals="Default" />
<column aggregation="Count" caption="Primary ID" datatype="string" default-type="nominal" layered="true" name="[Primary_ID]" pivot="key" role="dimension" type="nominal" user-datatype="string" visual-totals="Default" />
<column-instance column="[App_Info_Grp]" derivation="None" name="[none:App_Info_Grp:nk]" pivot="key" type="nominal" />
<column-instance column="[Primary_ID]" derivation="None" name="[none:Primary_ID:nk]" pivot="key" type="nominal" />
</datasource-dependencies>
</view>
</table>
</worksheet>
<worksheet name="App Info by PrimaryID">
<table>
<view>
<datasources>
<datasource caption="App Info" name="sqlproxy.1xdgpjx1g0jmum1f5c8nx1eniql9" />
</datasources>
<datasource-dependencies datasource="sqlproxy.1xdgpjx1g0jmum1f5c8nx1eniql9">
<column aggregation="Count" caption="App Info Type" datatype="string" default-type="nominal" layered="true" name="[App_Info_Type]" pivot="key" role="dimension" type="nominal" user-datatype="string" visual-totals="Default" />
<column aggregation="Count" datatype="string" default-type="nominal" layered="true" name="[EIN]" pivot="key" role="dimension" type="nominal" user-datatype="string" visual-totals="Default" />
<column-instance column="[App_Info_Type]" derivation="None" name="[none:App_Info_Type:nk]" pivot="key" type="nominal" />
<column-instance column="[EIN]" derivation="None" name="[none:EIN:nk]" pivot="key" type="nominal" />
</datasource-dependencies>
</view>
</table>
</worksheet>
</worksheets>
</workbook>
Any help is appreciated!
CodePudding user response:
Not sure this is exactly what you need but it might give you a hint on how to approach the logic to parse your XML.
$result = foreach($file in Get-ChildItem *.xml) {
[xml]$xml = Get-Content $file
foreach($ws in $xml.workbook.worksheets.worksheet) {
$wsname = $ws.name
foreach($tb in $ws.table) {
$tbname = $tb.view.datasources.datasource.caption
foreach($col in $tb.view.'datasource-dependencies'.'column-instance'.column) {
[pscustomobject]@{
FileName = $file.Name
'DATASOURCE_CAPTION' = $tbname
'WORKSHEET NAME' = $wsname
'COLUMN-INSTANCE_COLUMN' = $col
}
}
}
}
}
$result | Format-Table
I stored your XML as example.xml (FileName
property), this results in the following:
FileName DATASOURCE_CAPTION WORKSHEET NAME COLUMN-INSTANCE_COLUMN
-------- ------------------ -------------- ----------------------
example.xml App Info App Info [App_Info_Grp]
example.xml App Info App Info [Primary_ID]
example.xml App Info App Info by PrimaryID [App_Info_Type]
example.xml App Info App Info by PrimaryID [EIN]