Home > other >  API Webscrape OpenFDA with R
API Webscrape OpenFDA with R

Time:02-04

I am scraping OpenFDA (https://open.fda.gov/apis). I know my particular inquiry has 6974 hits, which is organized into 100 hits per page (max download of the API). I am trying to use R (rvest, jsonlite, purr, tidyverse, httr) to download all of this data.

I checked the website information with curl in terminal and downloaded a couple of sites to see a pattern.

I've tried a few lines of code and I can only get 100 entries to download. This code seems to work decently, but it will only pull 100 entries, so one page To skip the fisrt 100, which I can pull down and merge later, here is the code that I have used:

url_json <- "https://api.fda.gov/drug/label.json?api_key=YOULLHAVETOGETAKEY&search=grapefruit&limit=100&skip=6973"

raw_json <- httr::GET(url_json, accept_json())
data<- httr::content(raw_json, "text")
my_content_from_json <- jsonlite::fromJSON(data)
dplyr::glimpse(my_content_from_json)
dataframe1 <- my_content_from_json$results
view(dataframe1)


SOLUTION below in the responses. Thanks!

CodePudding user response:

From the comments:

It looks like the API parameters skip and limit work better than the search_after parameter. They allow pulling down 1,000 entries simultaneously according to the documentation (open.fda.gov/apis/query-parameters). To provide these parameters into the query string, an example URL would be

https://api.fda.gov/drug/label.json?api_key=YOULLHAVETOGETAKEY&search=grapefruit&limit=1000&skip=0

after which you can loop to get the remaining entries with skip=1000, skip=2000, etc. as you've done above.

  •  Tags:  
  • rapi
  • Related