Home > Back-end >  Basketball Reference Scraping with R
Basketball Reference Scraping with R

Time:11-14

I'm trying to scrape the website

https://www.basketball-reference.com/playoffs/NBA_2021_standings.html

for the Expanded Standings table. I have tried many variations using the rvest library but can't seem to get anything to work. The latest code used is:

url = "https://www.basketball-reference.com/playoffs/NBA_2021_standings.html"

test = url %>%
  rvest::read_html()  %>%
  rvest::html_nodes("table") %>%
  .[[1]] %>%
  rvest::html_table(header = FALSE)

Is there a way to scrape this table using rvest in R-studio?

CodePudding user response:

I don't try with revest, but today I was trying scrape and I got it with httr and xml library, see my code:

library(magrittr)

url = "https://www.basketball-reference.com/playoffs/NBA_2021_standings.html"

r_nba <- httr::GET(url,
                   httr::write_disk("nba_2021.html"))


# for Eastern table .conf == "E", and for Western .conf == "W"
.conf <- "E"
xpath <- glue::glue('//*[@id="confs_standings_{.conf}"]')
  
html <- 
    xml2::read_html(r_nba) %>%
    xml2::xml_find_first(xpath) %>% 
    rvest::html_table()

CodePudding user response:

I think this gets what you're after

library(rvest)
library(tidyverse)

url = "https://www.basketball-reference.com/playoffs/NBA_2021_standings.html"

page_html <- url %>%
  rvest::read_html()  

page_html %>%
  rvest::html_nodes("table") %>%
  .[[1]] %>%
  rvest::html_table(header = FALSE)


page_html %>% 
  html_nodes(xpath = '//comment()') %>% 
  .[29] %>% 
  html_text() %>%   
  paste(collapse = '') %>%   
  read_html() %>%   
  html_nodes('table') %>%    
  html_table() %>% 
  .[[1]]

# # A tibble: 17 x 18
#    ``    ``          ``    Place Place Conference Conference Division Division Division Division Division Division
#    <chr> <chr>       <chr> <chr> <chr> <chr>      <chr>      <chr>    <chr>    <chr>    <chr>    <chr>    <chr>   
#  1 Rk    Team        Over… Home  Road  "E"        "W"        "A"      "C"      "SE"     "NW"     "P"      "SW"    
#  2 1     Milwaukee … 16-7  10-1  6-6   "12-5"     "4-2"      "4-3"    ""       "8-2"    ""       "4-2"    ""      
#  3 2     Phoenix Su… 14-8  8-3   6-5   "2-4"      "12-4"     ""       "2-4"    ""       "4-0"    "8-4"    ""      
#  4 3     Atlanta Ha… 10-8  4-4   6-4   "10-8"     ""         "8-4"    "2-4"    ""       ""       ""       ""      
#  5 4     Los Angele… 10-9  5-5   5-4   ""         "10-9"     ""       ""       ""       "4-2"    "2-4"    "4-3"   
#  6 5     Brooklyn N… 7-5   6-1   1-4   "7-5"      ""         "4-1"    "3-4"    ""       ""       ""       ""      
#  7 6     Philadelph… 7-5   4-3   3-2   "7-5"      ""         ""       ""       "7-5"    ""       ""       ""   

  • Related