Home > Enterprise >  how to download a file from website using R
how to download a file from website using R

Time:11-08

Please help! I want to download the *.xls file from this website (https://echa.europa.eu/candidate-list-table) using R. This file can be easily downloaded on the website by clicking the "XLS" button. However, no "copy link location" is available in the right-click menu. I tried to use the rvest package following the https://www.edureka.co/community/57163/download-file-from-website-using-web-scraping, but the structure of the webpage is not the same as the example. There is no <a> tag and no href attribute associated with the XLS button.

screenshot of the webpage

Any clues? Many thanks in advance.

Best regards, Sukis

CodePudding user response:

You can spot the request by checking the network tabs of Chrome devtool, when downloading the file it makes a call to:

POST https://echa.europa.eu/candidate-list-table

It seems there is no cookie needed for this call, so you may just send the form data, and includes the query parameters. The following will save the file as test.xls:

library(httr)

url <- 'https://echa.europa.eu/candidate-list-table?p_p_id=disslists_WAR_disslistsportlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_resource_id=exportResults&p_p_cacheability=cacheLevelPage'

r <- POST(url, body = list(
  "_disslists_WAR_disslistsportlet_formDate" = as.numeric(as.POSIXct(Sys.time()))*1000,
  "_disslists_WAR_disslistsportlet_exportColumns" = "name,ecNumber,casNumber,haz_detailed_concern,dte_inclusion,doc_cat_decision,doc_cat_iuclid_dossier,doc_cat_supdoc,doc_cat_rcom,prc_external_remarks",
  "_disslists_WAR_disslistsportlet_orderByCol"= "dte_inclusion",
  "_disslists_WAR_disslistsportlet_orderByType"= "desc",
  "_disslists_WAR_disslistsportlet_searchFormColumns"= "haz_detailed_concern,dte_inclusion",
  "_disslists_WAR_disslistsportlet_searchFormElements"= "DROP_DOWN,DATE_PICKER",
  "_disslists_WAR_disslistsportlet_substance_identifier_field_key"="", 
  "_disslists_WAR_disslistsportlet_haz_detailed_concern"="",
  "_disslists_WAR_disslistsportlet_dte_inclusionFrom"="",
  "_disslists_WAR_disslistsportlet_dte_inclusionTo"="",
  "_disslists_WAR_disslistsportlet_total"= "219",
  "_disslists_WAR_disslistsportlet_exportType"= "xls"
), verbose(), write_disk("test.xls", overwrite=TRUE))
  • Related