Home > Back-end >  Classification issues between Rvest and Map_dfr during web-scrape
Classification issues between Rvest and Map_dfr during web-scrape

Time:07-22

I'm currently scraping stats from a website, but on certain stat pages I hit a snag with the following prompt:

Error: Column avg can't be converted from numeric to character

I try something like mutate(avg = avg %>% as.numeric), but then I get the prompt the column avg can't be found.

The issue in the code below occurs whenever I add stat_id 336 or 340. Any ideas on how to solve this?

library(dplyr)
library(tidyverse)
library(janitor)
library(rvest)
library(magrittr)


df <- expand.grid(
  tournament_id = c("t464", "t054", "t047"),
  stat_id = c("02564", "101", "102", "336", "340")
) %>% 
  mutate(
    links = paste0(
      'https://www.pgatour.com/stats/stat.',
      stat_id,
      '.y2019.eon.',
      tournament_id,
      '.html'
    )
  ) %>% 
  as_tibble()

# Function to get the table
get_info <- function(link, tournament) {
  link %>%
    read_html() %>%
    html_table() %>%
    .[[2]] %>%
    clean_names() %>% 
    select(-rank_last_week ) %>% 
    mutate(rank_this_week = rank_this_week %>% 
             as.character) %>%
    mutate(tournament) 
}

# Retrieve the tables and bind them
test12 <- df %$%
  map2_dfr(links, tournament_id, get_info) 
test12

CodePudding user response:

You generally don't want to put a pipe inside of a dplyr verb, or at least I have never before seen that done. Not sure why you need that in this example as average automatically parses as numeric. Try this instead:

# Function to get the table
get_info <- function(link, tournament_id) {
  data <- link %>%
    read_html() %>%
    html_table() %>%
    .[[2]] %>%
    clean_names() %>% 
    select(-rank_last_week ) %>% 
    mutate(rank_this_week = as.integer(str_extract(rank_this_week, "\\d ")))
  try(data <- mutate(data, avg = as.character(avg)), silent = TRUE)
  try(data <- mutate(data, total_distance_feet = as.character(total_distance_feet)), silent = TRUE)
  data
}

test12 <- df %>%
  mutate(tables = map2(links, tournament_id, get_info)) %>%
  tidyr::unnest(everything())
  • Related