I have got the following code, which is supposed to scrape the table and print the result. I am expecting to get them into a table
County| Shropshire Address| Adams Grammar School, High Street, Newport TF10 7BD etc
but i am getting the following:
CountyShropshire AddressAdams Grammar School, High Street, Newport TF10 7BD TypeBoys Grammar Pupils799
so no speace inside.
what am i missing?
Sub scrapeschools()
Dim ie As New SHDocVw.InternetExplorer
Dim htmldoc As MSHTML.HTMLDocument
Dim htmltable As MSHTML.htmltable
Dim tablerow As MSHTML.IHTMLElement
Dim tablecell As MSHTML.IHTMLElement
Dim tablecol As MSHTML.IHTMLElement
ie.Visible = True
ie.navigate "https://www.11plusguide.com/grammar-school-test-areas/wolverhampton-shropshire-walsall/adams-grammar-school/"
Do While ie.readyState < READYSTATE_COMPLETE Or ie.Busy
Loop
Set htmldoc = ie.document
Set htmltable = htmldoc.getElementsByTagName("table")(0)
For Each tablerow In htmltable.Children
For Each tablecell In tablerow.Children
Debug.Print tablecell.innerText
Next tablecell
Next tablerow
End Sub
CodePudding user response:
Your elements in the loop are not of the class you are expecting (based on your variable naming).
This has gone without raising an error as you have declared the elements involved in the loop generically as MSHTML.IHTMLElement
.
A debugging step could be to study the relevant HTML in order to understand what is produced by htmltable.Children
.
Add a couple of Debug.Print TypeName(foo)
calls within your loop to see what you are actually working with:
For Each tablerow In htmltable.Children
Debug.Print TypeName(tablerow)
For Each tablecell In tablerow.Children
Debug.Print TypeName(tablecell)
Next tablecell
Next tablerow
You are initially working with a section, then you are looping table rows not table cells, hence your squished output.
It is an easy fix.
- Use the
.Rows
property to ensure you are looping table rows and - Use a stronger typing in the declarations
Dim tablerow As MSHTML.IHTMLTableRow
Dim tablecell As MSHTML.IHTMLTableCell
For Each tablerow In htmltable.Rows
For Each tablecell In tablerow.Children
Debug.Print tablecell.innerText
Next tablecell
Next tablerow
Personally, I would:
- See if this page could be scraped using XHR requests as this would be faster and remove the now no longer supported IE
- Select the table with a more specific, and hopefully more stable over time, css selector list
- Use the clipboard to copy paste the table element OuterHTML to Excel. This would preserve the table formatting and avoid the added complexity of nested looping