Please help!
I want to download the *.xls file from this website (https://echa.europa.eu/candidate-list-table) using R. This file can be easily downloaded on the website by clicking the "XLS" button. However, no "copy link location" is available in the right-click menu. I tried to use the rvest
package following the https://www.edureka.co/community/57163/download-file-from-website-using-web-scraping, but the structure of the webpage is not the same as the example. There is no <a>
tag and no href
attribute associated with the XLS
button.
Any clues? Many thanks in advance.
Best regards, Sukis
CodePudding user response:
You can spot the request by checking the network tabs of Chrome devtool, when downloading the file it makes a call to:
POST https://echa.europa.eu/candidate-list-table
It seems there is no cookie needed for this call, so you may just send the form data, and includes the query parameters. The following will save the file as test.xls
:
library(httr)
url <- 'https://echa.europa.eu/candidate-list-table?p_p_id=disslists_WAR_disslistsportlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_resource_id=exportResults&p_p_cacheability=cacheLevelPage'
r <- POST(url, body = list(
"_disslists_WAR_disslistsportlet_formDate" = as.numeric(as.POSIXct(Sys.time()))*1000,
"_disslists_WAR_disslistsportlet_exportColumns" = "name,ecNumber,casNumber,haz_detailed_concern,dte_inclusion,doc_cat_decision,doc_cat_iuclid_dossier,doc_cat_supdoc,doc_cat_rcom,prc_external_remarks",
"_disslists_WAR_disslistsportlet_orderByCol"= "dte_inclusion",
"_disslists_WAR_disslistsportlet_orderByType"= "desc",
"_disslists_WAR_disslistsportlet_searchFormColumns"= "haz_detailed_concern,dte_inclusion",
"_disslists_WAR_disslistsportlet_searchFormElements"= "DROP_DOWN,DATE_PICKER",
"_disslists_WAR_disslistsportlet_substance_identifier_field_key"="",
"_disslists_WAR_disslistsportlet_haz_detailed_concern"="",
"_disslists_WAR_disslistsportlet_dte_inclusionFrom"="",
"_disslists_WAR_disslistsportlet_dte_inclusionTo"="",
"_disslists_WAR_disslistsportlet_total"= "219",
"_disslists_WAR_disslistsportlet_exportType"= "xls"
), verbose(), write_disk("test.xls", overwrite=TRUE))