Home > Blockchain >  Creating Multiple Tables then Combining All of the Tables into One in R
Creating Multiple Tables then Combining All of the Tables into One in R

Time:06-23

I have scraped multiple tables from a basketball site using a for loop.

years <- c(2016:2021)


final_table <- {}
for(i in 1:length(years)){
  
url <- paste0("https://www.basketball-reference.com/friv/free_agents.cgi?year=",years[i])
  
  past_free_agency_page <- read_html(url)
  
  past_free_agency_webtable<- html_nodes(past_free_agency_page, "table")
  
  past_free_agency_table <- html_table(past_free_agency_webtable, header = T)[[1]]
  
  final_table <- rbind(final_table, past_free_agency_table)
}

This retrieves everything correctly, but I am trying to combine all of these tables as they are created. If you notice it is 5 total tables (Year 2016 - 2021).

There is one error that I am getting: I try to combine the table with rbind() at the end of the loop. It does not work. It says "the names do not match". I do not know of a clever way to fix this issue because I am new to working with loops, and I have tried turning the scraped table into a df with no success.

My next issue has to do with how the tables are combined. In the website links, one can see that the table has headers within it, that repeat the Master header exactly. The code treats it as another row, so it appears as an instance within each of the tables. I want these to be ignored.

The last issue has to do with making each of these rows unique, I want the respective year of each table to be a column in its own. For example, for the year 2016, I want the table to have a column that says 2016. I have tried something inside the loop, such as past_free_agency_table[,1] <- c(years[i]), I want to do this because some of these tables have the same players, and I want to be able to uniquely identify, which table is which.

CodePudding user response:

Sort of a loop, but in purrr way.

library(tidyverse)
library(rvest)

get_df <- function(year) {
  "https://www.basketball-reference.com/friv/free_agents.cgi?year=" %>%
    paste0(., year) %>%
    read_html() %>%
    html_table() %>%
    .[[1]] %>% 
    mutate(years = year) %>% 
    select(Rk, years, everything())
}

df <- map_dfr(2016:2020, get_df)

# A tibble: 1,161 × 16
   Rk    years Player    Pos   Age   Type  OTm   `2015-16 Stats` WS    NTm  
   <chr> <int> <chr>     <chr> <chr> <chr> <chr> <chr>           <chr> <chr>
 1 1      2016 Kevin Du… F-G   33-2… UFA   OKC   28.2 Pts, 8.2 … 14.5  GSW  
 2 2      2016 LeBron J… F-G   37-1… UFA   CLE   25.3 Pts, 7.4 … 13.6  CLE  
 3 3      2016 Hassan W… C     33-0… UFA   MIA   14.2 Pts, 11.8… 10.3  MIA  
 4 4      2016 DeMar De… G-F   32-3… UFA   TOR   23.5 Pts, 4.5 … 9.9   TOR  
 5 5      2016 Al Horfo… C-F   36-0… UFA   ATL   15.2 Pts, 7.3 … 9.4   BOS  
 6 6      2016 Marvin W… F     36-0… UFA   CHO   11.7 Pts, 6.4 … 7.8   CHA  
 7 7      2016 Andre Dr… C     28-3… RFA   DET   16.2 Pts, 14.8… 7.4   DET  
 8 8      2016 Pau Gasol C-F   41-3… UFA   CHI   16.5 Pts, 11.0… 7.1   SAS  
 9 9      2016 Dirk Now… F     44-0… UFA   DAL   18.3 Pts, 6.5 … 6.8   DAL  
10 10     2016 Dwight H… C     36-1… UFA   HOU   13.7 Pts, 11.8… 6.6   ATL  
# … with 1,151 more rows, and 6 more variables: Terms <chr>, Notes <chr>,
#   `2016-17 Stats` <chr>, `2017-18 Stats` <chr>, `2018-19 Stats` <chr>,
#   `2019-20 Stats` <chr>
  • Related