I'm currently scraping stats from a website, but on certain stat pages I hit a snag with the following prompt:
Error: Column avg
can't be converted from numeric to character
I try something like mutate(avg = avg %>% as.numeric), but then I get the prompt the column avg can't be found.
The issue in the code below occurs whenever I add stat_id 336 or 340. Any ideas on how to solve this?
library(dplyr)
library(tidyverse)
library(janitor)
library(rvest)
library(magrittr)
df <- expand.grid(
tournament_id = c("t464", "t054", "t047"),
stat_id = c("02564", "101", "102", "336", "340")
) %>%
mutate(
links = paste0(
'https://www.pgatour.com/stats/stat.',
stat_id,
'.y2019.eon.',
tournament_id,
'.html'
)
) %>%
as_tibble()
# Function to get the table
get_info <- function(link, tournament) {
link %>%
read_html() %>%
html_table() %>%
.[[2]] %>%
clean_names() %>%
select(-rank_last_week ) %>%
mutate(rank_this_week = rank_this_week %>%
as.character) %>%
mutate(tournament)
}
# Retrieve the tables and bind them
test12 <- df %$%
map2_dfr(links, tournament_id, get_info)
test12
CodePudding user response:
You generally don't want to put a pipe inside of a dplyr verb, or at least I have never before seen that done. Not sure why you need that in this example as average automatically parses as numeric. Try this instead:
# Function to get the table
get_info <- function(link, tournament_id) {
data <- link %>%
read_html() %>%
html_table() %>%
.[[2]] %>%
clean_names() %>%
select(-rank_last_week ) %>%
mutate(rank_this_week = as.integer(str_extract(rank_this_week, "\\d ")))
try(data <- mutate(data, avg = as.character(avg)), silent = TRUE)
try(data <- mutate(data, total_distance_feet = as.character(total_distance_feet)), silent = TRUE)
data
}
test12 <- df %>%
mutate(tables = map2(links, tournament_id, get_info)) %>%
tidyr::unnest(everything())