I am trying to get data from a website (https://armstrade.sipri.org/armstrade/page/values.php) which requires submitting a form. There are some radio buttons and drop down boxes where you can select a time period (years) and countries and a download method. I am aware the the data can be downloaded manually, but I would like to programatically download the import data for all countries between 1990 and 2000.
I have tried two different approaches based on answers on SO (see below for code), but am having trouble getting it to actually produce results. Ideally, I would like a dataframe similar to one in the downloaded excel file. Any help or guidance would be greatly appreciated.
Thankyou in advance.
Approach 1
Th first approach is based on Python code for the same site: Scrape a php webpage that needs a submitted form
library(httr)
library(rvest)
df = httr::POST("https://armstrade.sipri.org/armstrade/html/export_values.php",
encode = "form",
body = list('import_or_export' = 'export',
'country_code'= 'All',
'from' = 1990,
'to' = 2000,
'summarize' = 'country',
'filetype'= 'excel',
'Action' ='Download'),
verbose())
Approach 2
The second approach I've tried is relatively similar to this approach, How to retrieve response by using POST in R
headers = c('Content-Type' = 'application/json; charset=UTF-8')
data = "{'country_code':'All','low_year':'1990','high_year':'2000','import_or_export':'import','summarize':'country','filetype':'html','Action':'Download'}"
r <- httr::POST(url = "https://armstrade.sipri.org/armstrade/html/export_values.php",
httr::add_headers(.headers=headers), body = data)
CodePudding user response:
I leave the parsing and cleaning to you, but here's a suggestion for the request
library(tidyverse)
library(httr2)
library(rvest)
"https://armstrade.sipri.org/armstrade/html/export_values.php" %>%
request() %>%
req_body_form(
'import_or_export' = 'export',
'country_code'= '',
'low_year' = 1990,
'high_year' = 2000,
'summarize' = 'country',
'filetype'= 'html',
'Action' = 'Download'
) %>%
req_perform() %>%
resp_body_html() %>%
html_table %>%
getElement(2) %>%
slice(11:nrow(.))
# A tibble: 89 x 14
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1   1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Total NA
2 Angola     8                 8 NA
3 Argentina 6 0   13 5 5         2 31 NA
4 Aruba             18         18 NA
5 Australia 168 90   30 36 36 16 20 4     400 NA
6 Austria 30 20 20 10 17   18 1 29 23 24 191 NA
7 Belarus       8   7 129 398 63 452 293 1349 NA
8 Belgium 1 1     33 158 57 93 46 45 26 458 NA
9 Brazil 106 127 98 40 54 38 27 27 18     535 NA
10 Bulgaria 6 42 16 28 55 1 21 6 39 167 2 381 NA
# ... with 79 more rows