I'm trying to scrape the website
https://www.basketball-reference.com/playoffs/NBA_2021_standings.html
for the Expanded Standings table. I have tried many variations using the rvest library but can't seem to get anything to work. The latest code used is:
url = "https://www.basketball-reference.com/playoffs/NBA_2021_standings.html"
test = url %>%
rvest::read_html() %>%
rvest::html_nodes("table") %>%
.[[1]] %>%
rvest::html_table(header = FALSE)
Is there a way to scrape this table using rvest in R-studio?
CodePudding user response:
I don't try with revest, but today I was trying scrape and I got it with httr and xml library, see my code:
library(magrittr)
url = "https://www.basketball-reference.com/playoffs/NBA_2021_standings.html"
r_nba <- httr::GET(url,
httr::write_disk("nba_2021.html"))
# for Eastern table .conf == "E", and for Western .conf == "W"
.conf <- "E"
xpath <- glue::glue('//*[@id="confs_standings_{.conf}"]')
html <-
xml2::read_html(r_nba) %>%
xml2::xml_find_first(xpath) %>%
rvest::html_table()
CodePudding user response:
I think this gets what you're after
library(rvest)
library(tidyverse)
url = "https://www.basketball-reference.com/playoffs/NBA_2021_standings.html"
page_html <- url %>%
rvest::read_html()
page_html %>%
rvest::html_nodes("table") %>%
.[[1]] %>%
rvest::html_table(header = FALSE)
page_html %>%
html_nodes(xpath = '//comment()') %>%
.[29] %>%
html_text() %>%
paste(collapse = '') %>%
read_html() %>%
html_nodes('table') %>%
html_table() %>%
.[[1]]
# # A tibble: 17 x 18
# `` `` `` Place Place Conference Conference Division Division Division Division Division Division
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 Rk Team Over… Home Road "E" "W" "A" "C" "SE" "NW" "P" "SW"
# 2 1 Milwaukee … 16-7 10-1 6-6 "12-5" "4-2" "4-3" "" "8-2" "" "4-2" ""
# 3 2 Phoenix Su… 14-8 8-3 6-5 "2-4" "12-4" "" "2-4" "" "4-0" "8-4" ""
# 4 3 Atlanta Ha… 10-8 4-4 6-4 "10-8" "" "8-4" "2-4" "" "" "" ""
# 5 4 Los Angele… 10-9 5-5 5-4 "" "10-9" "" "" "" "4-2" "2-4" "4-3"
# 6 5 Brooklyn N… 7-5 6-1 1-4 "7-5" "" "4-1" "3-4" "" "" "" ""
# 7 6 Philadelph… 7-5 4-3 3-2 "7-5" "" "" "" "7-5" "" "" ""