Home > Enterprise >  R: Error in `html_form_submit()`: `form` doesn't contain a `action` attribute
R: Error in `html_form_submit()`: `form` doesn't contain a `action` attribute

Time:01-22

I'm trying to automate downloading of the data contained here: enter image description here

I can fairly easily specify the form, either through the url in the way: https://www.offenerhaushalt.at/gemeinde/innsbruck/download?year=2022&haushalt=fhh&rechnungsabschluss=va&origin=gemeinde

Or through the rvest function html_form(), but I cannot download the form as the html_form_submit() throws the error:

Error in `submission_build()`:
! `form` doesn't contain a `action` attribute
library(rvest)
library(tidyverse)
html_form(read_html("https://www.offenerhaushalt.at/gemeinde/innsbruck/download"))[[1]] %>% 
    html_form_set(year = "2022", 
                  haushalt = "fhh",
                  rechnungsabschluss = "va",
                  origin = "gemeinde") %>% 
    html_form_submit()

Any ideas on how to capture the file that is generated afterwards and download it?

It seems to me that it sends the "action" to a url that looks like: https://www.offenerhaushalt.at/downloads/ghdByParams

But I'm not sure what to do with that.

Thanks all!

CodePudding user response:

You can manually set the action url for that form:

library(rvest)
library(purrr)
dl_url <- "https://www.offenerhaushalt.at/gemeinde/innsbruck/download"

sess <- session(dl_url)
form <- sess %>% read_html() %>% html_form() %>% .[[1]]

# list valid options for select boxes
map(form$fields, "options") %>% keep(~ length(.x) > 0) %>% 
  imap_dfr(~ list(field = .y, options = paste(.x, collapse = " ")))
#> # A tibble: 4 × 2
#>   field              options                                                    
#>   <chr>              <chr>                                                      
#> 1 haushalt           default fhh ehh vhh                                        
#> 2 rechnungsabschluss default ra va                                              
#> 3 year               default 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 …
#> 4 origin             default statistik_at gemeinde

# set values
form$fields$haushalt$value <- "fhh"
form$fields$rechnungsabschluss$value <- "ra"
form$fields$year$value <- "2020"
form$fields$origin$value <- "statistik_at"

# manually set form method & action
form$method <- "POST"
form$action <- "https://www.offenerhaushalt.at/downloads/ghdByParams"

# submit form
sess <- session_submit(sess, form)

# response headers
imap_dfr(sess$response$headers, ~ list(header = .y, value = .x))
#> # A tibble: 10 × 2
#>    header              value                                                    
#>    <chr>               <chr>                                                    
#>  1 date                Sat, 21 Jan 2023 01:47:13 GMT                            
#>  2 server              Apache                                                   
#>  3 content-disposition attachment; filename=offenerhaushalt_70101_2020_ra_fhh.c…
#>  4 pragma              no-cache                                                 
#>  5 cache-control       must-revalidate, post-check=0, pre-check=0, private      
#>  6 expires             0                                                        
#>  7 set-cookie          XSRF-TOKEN=eyJpdiI6IjdHd2pSakwzV09xb3Jab05zXC81em1RPT0iL…
#>  8 set-cookie          offener_haushalt_session=eyJpdiI6IjI5cUN5MGhCSmVadmN5enV…
#>  9 transfer-encoding   chunked                                                  
#> 10 content-type        text/csv; charset=UTF-8

# parse attached CSV
httr::content(sess$response, as = "text") %>% readr::read_csv2()
#> ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
#> Rows: 1408 Columns: 11
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> chr (8): ansatz_uab, ansatz_ugl, konto_grp, konto_ugl, sonst_ugl, vorhabenco...
#> dbl (2): mvag, wert
#> lgl (1): verguetung
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1,408 × 11
#>    ansat…¹ ansat…² konto…³ konto…⁴ sonst…⁵ vergu…⁶ vorha…⁷  mvag ansat…⁸ konto…⁹
#>    <chr>   <chr>   <chr>   <chr>   <chr>   <lgl>   <chr>   <dbl> <chr>   <chr>  
#>  1 000     000     042     000     000     NA      0000000  3415 Gewähl… Amts-,…
#>  2 000     000     070     000     000     NA      0000000  3411 Gewähl… Aktivi…
#>  3 000     000     400     000     000     NA      0000000  3221 Gewähl… Gering…
#>  4 000     000     413     000     000     NA      0000000  3221 Gewähl… Handel…
#>  5 000     000     456     000     000     NA      0000000  3221 Gewähl… Schrei…
#>  6 000     000     457     000     000     NA      0000000  3221 Gewähl… Druckw…
#>  7 000     000     459     000     000     NA      0000000  3221 Gewähl… Sonsti…
#>  8 000     000     618     000     000     NA      0000000  3224 Gewähl… Instan…
#>  9 000     000     621     000     000     NA      0000000  3222 Gewähl… Sonsti…
#> 10 000     000     631     000     000     NA      0000000  3222 Gewähl… Teleko…
#> # … with 1,398 more rows, 1 more variable: wert <dbl>, and abbreviated variable
#> #   names ¹​ansatz_uab, ²​ansatz_ugl, ³​konto_grp, ⁴​konto_ugl, ⁵​sonst_ugl,
#> #   ⁶​verguetung, ⁷​vorhabencode, ⁸​ansatz_text, ⁹​konto_text

As rvest accepts and passes on httr configs, attached files can be saved directly too:

dest_file <- tempfile(fileext = ".csv")
session_submit(sess, form, submit = NULL, httr::write_disk(dest_file))
# browseURL(dirname(dest_file))
  • Related