Home > Software engineering >  Mismatch in results from POST and manual website download
Mismatch in results from POST and manual website download

Time:10-04

I am trying to use a script to download freshwater fish observations from enter image description here

As you can see, all values are "" except for the download format. In your code, you pass start_year = 1850 and end_year = 2100. If we fix search_terms to match exactly what is passed in the browser, we get the correct number of rows:

# get web html info
get_doc <- function() {
  gr <- httr::GET("https://nzffdms.niwa.co.nz/search")
  xml2::read_html(httr::content(gr, "text"))
}

# get csrf_token
get_tok <- function() {
  xml2::xml_attr(xml2::xml_find_all(
    get_doc(),
    ".//input[@name='sample_search[_token]']"
  ), "value")
}

# compile search terms
search_terms <- list(
  "sample_search[organisation]" = "",
  "sample_search[catchment_no_name]" = "",
  "sample_search[catchment_name]" = "",
  "sample_search[water_body]" = "",
  "sample_search[sample_method]" = "",
  "sample_search[start_year]" = "",
  "sample_search[end_year]" = "",
  "sample_search[download_format]" = "cde",
  "sample_search[submit]" = "",
  "sample_search[_token]" = get_tok())


# run search
r <- httr::POST("https://nzffdms.niwa.co.nz/search",
                body = search_terms,
                encode = "form")


# convert to dataframe
res <- utils::read.csv(text = httr::content(r, "text", encoding = "UTF-8"))

nrow(res)
#> [1] 154723
head(res)
#>   nzffdRecordNumber  eventDate eventTime institution       waterBody
#> 1                 1 1979-06-05     10:30        NIWA Limestone Creek
#> 2                 1 1979-06-05     10:30        NIWA Limestone Creek
#> 3                 1 1979-06-05     10:30        NIWA Limestone Creek
#> 4                 1 1979-06-05     10:30        NIWA Limestone Creek
#> 5                 1 1979-06-05     10:30        NIWA Limestone Creek
#> 6                 1 1979-06-05     10:30        NIWA Limestone Creek
#>   waterBodyType site catchmentNumber catchmentName eastingNZTM northingNZTM
#> 1   Not Entered              691.021       Hinds R     1463229      5157184
#> 2   Not Entered              691.021       Hinds R     1463229      5157184
#> 3   Not Entered              691.021       Hinds R     1463229      5157184
#> 4   Not Entered              691.021       Hinds R     1463229      5157184
#> 5   Not Entered              691.021       Hinds R     1463229      5157184
#> 6   Not Entered              691.021       Hinds R     1463229      5157184
#>   minimumElevation distanceOcean                  samplingMethod
#> 1              480            60 Electric fishing - Type unknown
#> 2              480            60 Electric fishing - Type unknown
#> 3              480            60 Electric fishing - Type unknown
#> 4              480            60 Electric fishing - Type unknown
#> 5              480            60 Electric fishing - Type unknown
#> 6              480            60 Electric fishing - Type unknown
#>   samplingProtocol              taxonName     taxonCommonName totalCount
#> 1          Unknown   Galaxias brevipinnis               Koaro         NA
#> 2          Unknown      Galaxias vulgaris Canterbury galaxias         NA
#> 3          Unknown      Carassius auratus            Goldfish         NA
#> 4          Unknown     Galaxias maculatus              Inanga         NA
#> 5          Unknown Gobiomorphus breviceps        Upland bully         NA
#> 6          Unknown  Salvelinus fontinalis          Brook char         NA
#>   present soughtNotDetected minLength maxLength dataVersion
#> 1    true             false        NA        NA          V1
#> 2    true             false        NA        NA          V1
#> 3    true             false        NA        NA          V1
#> 4    true             false        NA        NA          V1
#> 5    true             false        NA        NA          V1
#> 6    true             false        NA        NA          V1

Created on 2022-10-03 with reprex v2.0.2

  • Related