Home > Mobile >  Using rvest to webscrape multiple pages
Using rvest to webscrape multiple pages

Time:09-29

I am trying to extract all speeches given by Melania Trump from 2016-2020 at the following link: https://www.presidency.ucsb.edu/documents/presidential-documents-archive-guidebook/remarks-and-statements-the-first-lady-laura-bush. I am trying to use rvest to do so. Here is my code thus far:

# get main link
link <- "https://www.presidency.ucsb.edu/documents/presidential-documents-archive-guidebook/remarks-and-statements-the-first-lady-laura-bush"

# main page
page <- read_html(link)

# extract speech titles
title <- page %>% html_nodes("td.views-field-title") %>% html_text()
title_links = page %>% html_nodes("td.views-field-title") %>%
  html_attr("href") %>% paste("https://www.presidency.ucsb.edu/",., sep="")
title_links

# extract year of speech
year <- page %>% html_nodes(".date-display-single") %>% html_text()

# extract name of person giving speech
flotus <- page %>% html_nodes(".views-field-title-1.nowrap") %>% html_text()

get_text <- function(title_link){
  speech_page = read_html(title_links)
  speech_text = speech_page %>% html_nodes(".field-docs-content p") %>%
  html_text()  %>% paste(collapse = ",")
  return(speech_page)
}

text = sapply(title_links, FUN = get_text)

I am having trouble with the following line of code:

title <- page %>% html_nodes("td.views-field-title") %>% html_text()
title_links = page %>% html_nodes("td.views-field-title") %>%
  html_attr("href") %>% paste("https://www.presidency.ucsb.edu/",., sep="")
title_links

In particular, title_links yields a series of links like this: "https://www.presidency.ucsb.eduNA", rather than the individual web pages. Does anyone know what I am doing wrong here? Any help would be appreciated.

CodePudding user response:

You are querying the wrong css node. Try:

page %>% html_elements(css = "td.views-field-title a") %>% html_attr('href')


 [1] "https://www.presidency.ucsb.edu/documents/remarks-mrs-laura-bush-the-national-press-club"                                            
 [2] "https://www.presidency.ucsb.edu/documents/remarks-the-first-lady-un-commission-the-status-women-international-womens-day"            
 [3] "https://www.presidency.ucsb.edu/documents/remarks-the-first-lady-the-colorado-early-childhood-cognitive-development-summit"          
 [4] "https://www.presidency.ucsb.edu/documents/remarks-the-first-lady-the-10th-anniversary-the-holocaust-memorial-museum-and-opening-anne"
 [5] "https://www.presidency.ucsb.edu/documents/remarks-the-first-lady-the-preserve-america-initiative-portland-maine"  
  • Related