Background
I have an Excel spreadsheet that I have saved as a 2003 XML spread sheet and I have pasted this into a console mode VB.NET program created with VS 2022.
Goal
I want to automate adding some columns using VB.NET.
Observations
I see that MSExcel makes extensive use of XML namespaces:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<Worksheet ss:Name="197 Industry Groups">
<Table ss:ExpandedColumnCount="10" ss:ExpandedRowCount="198" x:FullColumns="1"
x:FullRows="1" ss:DefaultColumnWidth="42" ss:DefaultRowHeight="11.25">
<Row>
<Cell><Data ss:Type="String">Order</Data><NamedCell ss:Name="_FilterDatabase"/></Cell>
<Cell><Data ss:Type="String">Symbol</Data><NamedCell ss:Name="_FilterDatabase"/></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">2</Data><NamedCell ss:Name="_FilterDatabase"/></Cell>
<Cell ss:StyleID="s62" ss:HRef="https://marketsmith.investors.com/mstool?Symbol=G1000"><Data
ss:Type="String">G1000</Data><NamedCell ss:Name="_FilterDatabase"/></Cell>
Plan
Use the XPath to get a reference to the first row and add a cell to it using this post as a guide
Questions
- What namespaces do I need to add to the namespaceManager? What do I use for the second (uri) argument to the AddNameSpace function? Do I need to add an empty namespace?
- Why do the values g2 and g1000 get the value Nothing?
Dim g2 = industryGroups.XPathSelectElement("//ss:Workbook/ss:Worksheet/ss:Table/ss:Row[0]", namespaceManager)
Dim g1000 = (From p In industryGroups.Descendants("Table") Select p).FirstOrDefault()
- How do I add a new XElement (cell) using the other cells as a pattern (i.e. each cell contains a data with an attribute or in some cases the cell will have an href attribute). Can I use the XML literal feature and embed my computed hrefs, strings and numbers?
Thanks!
Siegfried
CodePudding user response:
The following should be helpful for questions #1 and #2:
Add the following Imports statements:
Imports System.Xml
Imports System.Xml.XPath
Code:
Dim xDoc As XDocument = XDocument.Load(filename)
Debug.WriteLine($"xDoc: {xDoc.ToString()}")
'add namespaces that exist in XML file
Dim nsSS As XNamespace = "urn:schemas-microsoft-com:office:spreadsheet"
'Dim dataItems = From x In xDoc.Descendants(nsSS "Workbook").Descendants(nsSS "Worksheet").Descendants(nsSS "Table").Descendants(nsSS "Row").Descendants(nsSS "Cell").Descendants(nsSS "Data") Select x
'Dim dataItems = From x In xDoc.Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Workbook").Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Worksheet").Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Table").Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Row").Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Cell").Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Data") Select x
Dim dataItems = From x In xDoc.Descendants(nsSS "Table").Descendants(nsSS "Row").Descendants(nsSS "Cell").Descendants(nsSS "Data") Select x
Debug.WriteLine($"dataItems: {dataItems.Count.ToString()} Type: {dataItems.GetType.ToString()}")
If dataItems IsNot Nothing Then
For Each item In dataItems
Debug.WriteLine($"{item.ToString()}")
Next
End If
Code:
Dim xDoc As XDocument = XDocument.Load(filename)
'Debug.WriteLine($"xDoc: {xDoc.ToString()}")
'add namespaces that exist in XML file
Dim nsMgr = New XmlNamespaceManager(New NameTable())
nsMgr.AddNamespace("", "urn:schemas-microsoft-com:office:spreadsheet")
nsMgr.AddNamespace("o", "urn:schemas-microsoft-com:office:office")
nsMgr.AddNamespace("x", "urn:schemas-microsoft-com:office:excel")
nsMgr.AddNamespace("ss", "urn:schemas-microsoft-com:office:spreadsheet")
nsMgr.AddNamespace("html", "http://www.w3.org/TR/REC-html40")
Dim g2 = xDoc.XPathSelectElement("ss:Workbook/ss:Worksheet/ss:Table", nsMgr)
Debug.WriteLine($"g2: {g2.ToString()}")
Test.xml:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>TestUser</Author>
<LastAuthor>TestUser</LastAuthor>
<Created>2022-04-02T14:35:55Z</Created>
<LastSaved>2022-04-02T14:37:37Z</LastSaved>
<Version>16.00</Version>
</DocumentProperties>
<OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
<AllowPNG/>
</OfficeDocumentSettings>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>5955</WindowHeight>
<WindowWidth>17970</WindowWidth>
<WindowTopX>32767</WindowTopX>
<WindowTopY>32767</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font ss:FontName="Calibri" x:Family="Swiss" ss:Size="11" ss:Color="#000000"/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s62">
<NumberFormat ss:Format="0"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="3" x:FullColumns="1"
x:FullRows="1" ss:DefaultRowHeight="15">
<Column ss:StyleID="s62"/>
<Row>
<Cell><Data ss:Type="String">Id</Data></Cell>
<Cell><Data ss:Type="String">FirstName</Data></Cell>
<Cell><Data ss:Type="String">LastName</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">1</Data></Cell>
<Cell><Data ss:Type="String">John</Data></Cell>
<Cell><Data ss:Type="String">Smith</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">2</Data></Cell>
<Cell><Data ss:Type="String">Bob</Data></Cell>
<Cell><Data ss:Type="String">Seagul</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.3"/>
<Footer x:Margin="0.3"/>
<PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
</PageSetup>
<Print>
<ValidPrinterInfo/>
<HorizontalResolution>600</HorizontalResolution>
<VerticalResolution>600</VerticalResolution>
</Print>
<Selected/>
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>2</ActiveRow>
<ActiveCol>2</ActiveCol>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
Resources: