I wanted to scrape this webpage:
http://protected.to/f-42cbf8ce2521d615
but I have to click on "continue to folder" to get to those links. These links I cannot see in the html source but only when i physically use a mouse to click on the "continue to folder" button.
How can I avoid that physical click to get to thise in the website the website?
I am new to web scraping so please guide me as to how I can go about solving this issue.
Thanks for ur attention and time.
Ozooha
CodePudding user response:
"Continue to Folder" is a submit button for the form which POSTs the "__RequestVerificationToken" value and the slug token to the page to display the contents of the folder.
So, in theory - you have to parse the HTML in http://protected.to/f-42cbf8ce2521d615 to extract the value of the hidden field "__RequestVerificationToken" that's the input name holding that token value; to obtain the slug token you need to look between the tags, you will see it dynamically creates a slug token when you load the page;
Once you got that value, you'll have to make a POST to the same URL http://protected.to/f-42cbf8ce2521d615 with the token and slug, the contents of the body will look something like this: __RequestVerificationToken=8BYeNPftVEEivO2imhtWIuWAb0mjhPg-5pAhq1mlpL_pTyYR1AyScbfqB8QZDudwGY_1LkV79FCDgpyffRPuktApd2ZQYBdi2ySA5ATUZ601&Slug=42cbf8ce2521d615
The above would return the page with the folder contents; you can replicate what I am saying above by simply opening up Dev tools and inspecting what happens when you hit 'Continue to Folder', you can see the POST made with the contents along with elements of the page which contain the items needed to make the POST call (the verification token and slug token).
CodePudding user response:
You can use complex libraries written for behaving like user, selenium. But I would go to simple .click()
to the button then parse the HTML.
const button = document.querySelector('[value="Continue to folder"]');
button.click();
// Parse the HTML