Home > other >  how to scrape all links from a page in r
how to scrape all links from a page in r

Time:09-12

I want to scrape all the division links from a website, but I keep getting NAs. Any ideas on a fix?

library(rvest)
library(tidyverse)

pageMen = read_html('https://www.bjjcompsystem.com/tournaments/1869/categories')

get_links <- pageMen %>% html_nodes('.panel-default') %>% html_attr('href')
get_links

By adjusting the above, I managed to scrape one link, but cannot find where all the other links are contained when I inspect the elements

get_links <- pageMen %>% html_elements(xpath = '/html/body/div[3]/div/div/div/ul/li[1]/a') %>% html_attr('href') %>% paste0('https://www.bjjcompsystem.com',.) 
get_links

CodePudding user response:

You could do

library(rvest)
library(tidyverse)

pageMen = read_html('https://www.bjjcompsystem.com/tournaments/1869/categories')

get_links <- pageMen %>% 
  html_nodes('.categories-grid__category a') %>% 
  html_attr('href') %>%
  paste0('https://www.bjjcompsystem.com', .)

get_links[1:5]
#> [1] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053146"
#> [2] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053150"
#> [3] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053154"
#> [4] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053158"
#> [5] "https://www.bjjcompsystem.com/tournaments/1869/categories/2053162"
  • Related