I'm trying to parse a website that require to log in a session using Rvest.
I'm using this code to begin :
login<-"https://www.drugs.com/account/login/"
session<-html_session(login)
form<-html_form(session)
But even after extracting all forms it just recognize the "Advanced Search" form and not the login form.
Do you have an idea why this happen? I was wondering if the login form require javascript or something like this.
Thank you, Vitruves
CodePudding user response:
Depending on where you are, I believe the problem may be the EU GDPR consent. The first time I opened the website it asked me to accept cookies in order to log in. Accepting set the following cookie in my browser:
ddbab21688799cacb48f7d384642573f = "agree"
and only after displayed the log-in form. For me the name of the cookie was always set to the same value, but if this is not always the case then you may have to accept consent within your rvest
session to have the cookie set.
If I set the cookie when opening the rvest
session, I get two forms returned, one of which is the log-in form.
You can set the cookie as follows:
login <- "https://www.drugs.com/account/login/"
session <- html_session(login, httr::set_cookies(ddbab21688799cacb48f7d384642573f = "agree"))
form <- html_form(session)