Home > database >  Web Scraping with Login MSXML2.XMLHTTP60
Web Scraping with Login MSXML2.XMLHTTP60

Time:10-18

Newbie at VBA. I'm trying to use a method I saw in this [tutorial][1] to access some data in the following webpage. https://my.powerdiary.com/Profile/Client/Search

However, requires a login and not sure how to get through that. I've done my research but from whatever I found online it seems very hard for me to understand how I'm going to login to that page.

Can anyone help me or guide me?

Option Explicit

Sub HTMLTest()

    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    
    Dim HTMLInputUser As MSHTML.IHTMLElement
    Dim HTMLInputPass As MSHTML.IHTMLElement

    Dim HTMLRoles As MSHTML.IHTMLElementCollection
    Dim HTMLRole As MSHTML.IHTMLElement
    
    XMLReq.Open "GET", "https://my.powerdiary.com/Profile/Client/Search", False, "email", "password"
    XMLReq.send
    
    If XMLReq.Status <> 200 Then
        MsgBox "Problem" & vbNewLine & XMLReq.Status & "-" & XMLReq.statusText
        Exit Sub
    End If
        
    HTMLDoc.body.innerHTML = XMLReq.responseText
    
    Set HTMLRoles = HTMLDoc.getElementsByClassName("form-section-title mt10")
        
    Debug.Print HTMLRoles.Length
    
    For Each HTMLRole In HTMLRoles
        Debug.Print HTMLRole.tagName, HTMLRole.innerText, HTMLRole.innerHTML
        
    Next HTMLRole
    
End Sub

The screenshot I’ve added is from the page afte login in. So, trying to check if I’ve logged in by try to return those elements.

[1]: https://www.youtube.com/watch?v=sGw6r5GVA5g&ab_channel=WiseOwlTutorials![enter image description here](https://i.stack.imgur.com/WeThW.png)

CodePudding user response:

It's really hard to offer any solution when there is no credentials to test with, so the solution I'm trying to privide is hypothetical.

According to the dev tools, you have to fill in the value of the following three keys before sending them as parameter with post requests:

__RequestVerificationToken: 
UserName: 
Password: 
RememberMe: false
MobileSite: false

You need to send a get requests first to collect the value of token. Once the fields are duly filled in, you need to encode them.

This is how the script should look like:

Sub ScrapeAfterLogIn()
    Const logInUrl = "https://my.powerdiary.com/Account/Login?ReturnUrl=/Profile/Client/Search"
    Const contentPage = "https://my.powerdiary.com/Profile/Client/Search"
    
    Dim oHttp As Object, Html As HTMLDocument, formData$, sToken$
    Dim sUser$, sPass$, sResp$

    Set Html = New HTMLDocument
    Set oHttp = CreateObject("MSXML2.XMLHTTP")

    sUser = ""     'your username
    sPass = ""     'your password

    With oHttp
        .Open "GET", logInUrl, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
        .send
        Html.body.innerHTML = .responseText
        sToken = Html.querySelector("input[name='__RequestVerificationToken']").getAttribute("value")
        formData = "__RequestVerificationToken=" & Application.WorksheetFunction.EncodeURL(sToken) & "&UserName=" & Application.WorksheetFunction.EncodeURL(sUser) & "&Password=" & Application.WorksheetFunction.EncodeURL(sPass) & "&RememberMe=false&MobileSite=false"

        .Open "POST", logInUrl, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        .send (formData)
        
        .Open "GET", contentPage, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
        .send
        sResp = .responseText
    End With

    Debug.Print sResp
End Sub

If for some reason you don't get content from the desired page, you wanna use cookies within the script to achieve that.

  • Related