I want to scrape all the division links from a website, but I keep getting NAs. Any ideas on a fix?
library(rvest)
library(tidyverse)
pageMen = read_html('https://www.bjjcompsystem.com/tournaments/1869/categories')
get_links <- pageMen %>% html_nodes('.panel-default') %>% html_attr('href')
get_links
By adjusting the above, I managed to scrape one link, but cannot find where all the other links are contained when I inspect the elements
get_links <- pageMen %>% html_elements(xpath = '/html/body/div[3]/div/div/div/ul/li[1]/a') %>% html_attr('href') %>% paste0('https://www.bjjcompsystem.com',.)
get_links
CodePudding user response:
You could do
library(rvest)
library(tidyverse)
pageMen = read_html('https://www.bjjcompsystem.com/tournaments/1869/categories')
get_links <- pageMen %>%
html_nodes('.categories-grid__category a') %>%
html_attr('href') %>%
paste0('https://www.bjjcompsystem.com', .)
get_links[1:5]
#> [1] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053146"
#> [2] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053150"
#> [3] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053154"
#> [4] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053158"
#> [5] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053162"