Home > database >  Scraping table behind login wall
Scraping table behind login wall

Time:02-04

I am struggling to get the right piece of code to scrape a table that is being a password protected website into an excel workbook. I have been able to get all of the code to work up to the scraping of the table part. When I run the code, it opens IE, logins in but then errors out (91: Object variable or WITH block variable not set). The code is below:

Private Sub CommandButton3_Click()
Declare variables
  Dim IE As Object
  Dim Doc As HTMLDocument
  Dim HTMLTable As Object
  Dim TableRow As Object
  Dim TableCell As Object
  Dim myRow As Long
  
'Create a new instance of Internet Explorer
  Set IE = CreateObject("InternetExplorer.Application")
  IE.Visible = True
  
'Navigate to the website
  IE.Navigate "https://www.myfueltanksolutions.com/validate.asp"
  
'Wait for the page to finish loading
  Do While IE.ReadyState <> 4
    DoEvents
  Loop
  
'Set the document object
  Set Doc = IE.Document
  
'Fill in the security boxes
  Doc.all("CompanyID").Value = "ID"
  Doc.all("UserId").Value = "Username"
  Doc.all("Password").Value = "Password"
  
'Click the submit button
  Doc.all("btnSubmit").Click
  
'Wait for the page to finish loading
    Do Until IE.ReadyState = READYSTATE_COMPLETE
        DoEvents
    Loop
  
'Set the HTMLTable object
  Set HTMLTable = Doc.getElementById("RecentInventorylistform")
  
  
'Loop through each row in the table
  For Each TableRow In HTMLTable.getElementsByTagName("tr")
  
'Loop through each cell in the row
    For Each TableCell In TableRow.getElementsByTagName("td")
    
'Write the table cell value to the worksheet
      Worksheets("Sheet1").Range("A5").Offset(myRow, 0).Value = TableCell.innerText
      myRow = myRow   1
    
    Next TableCell
    
  Next TableRow
  
    Do Until IE.ReadyState = READYSTATE_COMPLETE
        DoEvents
    Loop
    
'Log out and close website

    IE.Navigate ("https://www.myfueltanksolutions.com/signout.asp?action=rememberlogin")
    
    IE.Quit
  
End Sub

I have included the HTML code of the table I am trying to scrape on the re-directed page after login.HTML CODE

CodePudding user response:

I wont be tired to told it again and again and again and ... ;-)
Don't work with the IE anymore. MS is actively phasing it out!

But for explanation:

I'am sure, this is the code fragment which don't do what you expect:

...
...

'Wait for the page to finish loading
Do Until IE.ReadyState = READYSTATE_COMPLETE
  DoEvents
Loop

'Set the HTMLTable object
Set HTMLTable = Doc.getElementById("RecentInventorylistform")

...
...

Waiting for READYSTATE_COMPLETE doesn't work here (for which reasons ever). So the code will go on without a stop and doesn't load the new content. The use of getElementByID() ends up in the named error then because there is no element with that id.

Excursus for some get-methods of the DOM (Document Object Model):

The methods getElementsByTagName() and getElementsByClassName() will build a node collection which contains all elements with the given criterion. If you build a collection like that with getElementsByTagName("a") you get a collection with all anchor tags. Every element of the collection can be called with it's index like in an array. If you want to know how many elements are in a collection like that you can read the attribute length. If there is no element you ask for, in our example a-tags, the length will be 0. But the collection was build so you have an object.

The get-methods which build a collection have an s for plural in ...Elements... But getElementByID() has no s because an id can only be once in a html document. No collection needed here. The method getElementByID() always try to buld an object from the asked criterion. If there is no element like that you will get the error that there is no object.

How to solve the issue:

We must change the termination criterion and the body of the loop. We must ask again and again if the element with the wanted id is present. To do that we must use the given line:
Set HTMLTable = Doc.getElementById("RecentInventorylistform")

Like I said before there will be raising an error if it is not present. That's right. But with On Error Resume Next we can ignore any error in the code.

Attention!
Only use this in specific situations and switch back to error handling with On Error GoTo 0 after the critical part of code.

Replace the code I posted above in this answer with the following one:
(To avoid endless loops it is recommended to use a time out mechanism too. But I will keep it simple here.)

Do
  On Error Resume Next
  Set HTMLTable = Doc.getElementById("RecentInventorylistform")
  On Error GoTo 0
Loop While HTMLTable Is Nothing
  • Related