Home > front end >  Is there a way to get HTML text in VBA?
Is there a way to get HTML text in VBA?

Time:05-23

The following is a sample of the HTML Document I am working with

<tbody>
  <tr>
    <td>
    Some text here
    <div >Some title here</div>
    </td>
  </tr>
</tbody>

My code is the following:

With HTMLDocument.getElementsByTagName("table")(iTable)
    For Each Tr In .Rows
        For Each Td In Tr.Cells
            Debug.Print Td.innerText
            Sheets(UserDataSheetName).Cells(iRow, iCol).Value = Td.innerText
            iCol = iCol   1
        Next Td
        iCol = Col_Num_To_Start
        iRow = iRow   1
    Next Tr
End With

Currently, what happens is that Td.innerText in my code not only gets the text "Some text here", but it also saves "Some title here" into my excel spreadsheet. Is there a way I can do it such that I only save "Some text here" into the spreadsheet?

CodePudding user response:

To make the macro work with an external source such as a webpage, we’ll need Microsoft’s Internet Explore (to open the page and will remain hidden). In-addition, to read and extract contents of HTML elements, we’ll have to create few objects using a library.

Therefore, first add the Microsoft HTML Object Library reference to the application. From the top menu of your VBA editor, click Tools -> References…. In the References window, find and select Microsoft HTML Object Library and click OK.

CodePudding user response:

If you work with methods like getElementsByTagName(), you work with the DOM (Document Object Model). The DOM builds a tree of nodes from all elements of a page. There are 12 different node types. One of them is text. A textnode is text which stands alone without a html tag. So you can't get it's value with innertext. You have to use textContent. To pick the right node, you can pick a html node before and than use commands like FirstChild and NextSibling to reach the wanted text node.

Look here for more infos about DOM nodes:
https://www.w3schools.com/jsref/prop_node_nodetype.asp

For your example it works in this way:

Sub GetTextNode()

  Dim html As String
  Dim doc As Object
  
  html = "<table><tbody><tr><td>Some text here<div class='some class here'>Some title here</div></td></tr></tbody></table>"
  Set doc = CreateObject("htmlFile")
  doc.body.innerhtml = html
  
  MsgBox doc.getElementsByTagName("td")(0).FirstChild.textContent

End Sub
  • Related