Home > Enterprise >  How to add columns to MS Office Excel Spreadsheet stored as VB.NET XML Literal
How to add columns to MS Office Excel Spreadsheet stored as VB.NET XML Literal

Time:04-03

Background

I have an Excel spreadsheet that I have saved as a 2003 XML spread sheet and I have pasted this into a console mode VB.NET program created with VS 2022.

Goal

I want to automate adding some columns using VB.NET.

Observations

I see that MSExcel makes extensive use of XML namespaces:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
    <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
       xmlns:o="urn:schemas-microsoft-com:office:office"
       xmlns:x="urn:schemas-microsoft-com:office:excel"
       xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"                                    
       xmlns:html="http://www.w3.org/TR/REC-html40">
      <Worksheet ss:Name="197 Industry Groups">
        <Table ss:ExpandedColumnCount="10" ss:ExpandedRowCount="198" x:FullColumns="1"
          x:FullRows="1" ss:DefaultColumnWidth="42" ss:DefaultRowHeight="11.25">
          <Row>
             <Cell><Data ss:Type="String">Order</Data><NamedCell ss:Name="_FilterDatabase"/></Cell>
             <Cell><Data ss:Type="String">Symbol</Data><NamedCell ss:Name="_FilterDatabase"/></Cell>
           </Row>
           <Row>
             <Cell><Data ss:Type="Number">2</Data><NamedCell ss:Name="_FilterDatabase"/></Cell>
             <Cell ss:StyleID="s62" ss:HRef="https://marketsmith.investors.com/mstool?Symbol=G1000"><Data
                                                                                                                                                 ss:Type="String">G1000</Data><NamedCell ss:Name="_FilterDatabase"/></Cell>

Plan

Use the XPath to get a reference to the first row and add a cell to it using this post as a guide

Questions

  1. What namespaces do I need to add to the namespaceManager? What do I use for the second (uri) argument to the AddNameSpace function? Do I need to add an empty namespace?
  2. Why do the values g2 and g1000 get the value Nothing?
        Dim g2 = industryGroups.XPathSelectElement("//ss:Workbook/ss:Worksheet/ss:Table/ss:Row[0]", namespaceManager)
        Dim g1000 = (From p In industryGroups.Descendants("Table") Select p).FirstOrDefault()

  1. How do I add a new XElement (cell) using the other cells as a pattern (i.e. each cell contains a data with an attribute or in some cases the cell will have an href attribute). Can I use the XML literal feature and embed my computed hrefs, strings and numbers?

Thanks!

Siegfried

CodePudding user response:

The following should be helpful for questions #1 and #2:

Add the following Imports statements:

  • Imports System.Xml
  • Imports System.Xml.XPath

Code:

Dim xDoc As XDocument = XDocument.Load(filename)

Debug.WriteLine($"xDoc: {xDoc.ToString()}")

'add namespaces that exist in XML file
Dim nsSS As XNamespace = "urn:schemas-microsoft-com:office:spreadsheet"

'Dim dataItems = From x In xDoc.Descendants(nsSS   "Workbook").Descendants(nsSS   "Worksheet").Descendants(nsSS   "Table").Descendants(nsSS   "Row").Descendants(nsSS   "Cell").Descendants(nsSS   "Data") Select x
'Dim dataItems = From x In xDoc.Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Workbook").Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Worksheet").Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Table").Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Row").Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Cell").Descendants("{urn:schemas-microsoft-com:office:spreadsheet}Data") Select x
Dim dataItems = From x In xDoc.Descendants(nsSS   "Table").Descendants(nsSS   "Row").Descendants(nsSS   "Cell").Descendants(nsSS   "Data") Select x

Debug.WriteLine($"dataItems: {dataItems.Count.ToString()} Type: {dataItems.GetType.ToString()}")

If dataItems IsNot Nothing Then
    For Each item In dataItems
        Debug.WriteLine($"{item.ToString()}")
    Next
End If

Code:

Dim xDoc As XDocument = XDocument.Load(filename)

'Debug.WriteLine($"xDoc: {xDoc.ToString()}")

'add namespaces that exist in XML file
Dim nsMgr = New XmlNamespaceManager(New NameTable())
nsMgr.AddNamespace("", "urn:schemas-microsoft-com:office:spreadsheet")
nsMgr.AddNamespace("o", "urn:schemas-microsoft-com:office:office")
nsMgr.AddNamespace("x", "urn:schemas-microsoft-com:office:excel")
nsMgr.AddNamespace("ss", "urn:schemas-microsoft-com:office:spreadsheet")
nsMgr.AddNamespace("html", "http://www.w3.org/TR/REC-html40")

Dim g2 = xDoc.XPathSelectElement("ss:Workbook/ss:Worksheet/ss:Table", nsMgr)
Debug.WriteLine($"g2: {g2.ToString()}")

Test.xml:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
  <Author>TestUser</Author>
  <LastAuthor>TestUser</LastAuthor>
  <Created>2022-04-02T14:35:55Z</Created>
  <LastSaved>2022-04-02T14:37:37Z</LastSaved>
  <Version>16.00</Version>
 </DocumentProperties>
 <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
  <AllowPNG/>
 </OfficeDocumentSettings>
 <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
  <WindowHeight>5955</WindowHeight>
  <WindowWidth>17970</WindowWidth>
  <WindowTopX>32767</WindowTopX>
  <WindowTopY>32767</WindowTopY>
  <ProtectStructure>False</ProtectStructure>
  <ProtectWindows>False</ProtectWindows>
 </ExcelWorkbook>
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Calibri" x:Family="Swiss" ss:Size="11" ss:Color="#000000"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
  <Style ss:ID="s62">
   <NumberFormat ss:Format="0"/>
  </Style>
 </Styles>
 <Worksheet ss:Name="Sheet1">
  <Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="3" x:FullColumns="1"
   x:FullRows="1" ss:DefaultRowHeight="15">
   <Column ss:StyleID="s62"/>
   <Row>
    <Cell><Data ss:Type="String">Id</Data></Cell>
    <Cell><Data ss:Type="String">FirstName</Data></Cell>
    <Cell><Data ss:Type="String">LastName</Data></Cell>
   </Row>
   <Row>
    <Cell><Data ss:Type="Number">1</Data></Cell>
    <Cell><Data ss:Type="String">John</Data></Cell>
    <Cell><Data ss:Type="String">Smith</Data></Cell>
   </Row>
   <Row>
    <Cell><Data ss:Type="Number">2</Data></Cell>
    <Cell><Data ss:Type="String">Bob</Data></Cell>
    <Cell><Data ss:Type="String">Seagul</Data></Cell>
   </Row>
  </Table>
  <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   <PageSetup>
    <Header x:Margin="0.3"/>
    <Footer x:Margin="0.3"/>
    <PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
   </PageSetup>
   <Print>
    <ValidPrinterInfo/>
    <HorizontalResolution>600</HorizontalResolution>
    <VerticalResolution>600</VerticalResolution>
   </Print>
   <Selected/>
   <Panes>
    <Pane>
     <Number>3</Number>
     <ActiveRow>2</ActiveRow>
     <ActiveCol>2</ActiveCol>
    </Pane>
   </Panes>
   <ProtectObjects>False</ProtectObjects>
   <ProtectScenarios>False</ProtectScenarios>
  </WorksheetOptions>
 </Worksheet>
</Workbook>

Resources:

  • Related