I'm scraping a website and while trying to merge the data, I get this error:
Error in `f()`:
! Can't join on `x$Grand Slam Pts` x `y$Grand Slam Pts`
because of incompatible types.
ℹ `x$Grand Slam Pts` is of type <character>>.
ℹ `y$Grand Slam Pts` is of type <integer>>.
Run `rlang::last_error()` to see where the error occurred.
As the error says, some of the 'Grand Slam Pts' columns are characters and the others are numeric. Everything is nested in a list. How do I change the 'Grand Slam Pts' column so they're all integers?
library(rvest); library(tidyverse); library(stringr)
my_links = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/links_for_table400-415.csv')
my_links = unname(unlist(my_links))
table_list <- list()
for (i in 1:16) {
page <- read_html(my_links[i])
the_table <- page %>% html_table() %>% .[[1]] %>%
filter(X1 != "")
colnames(the_table) <- page %>% html_element(xpath = "//thead") %>%
html_text() %>%
str_replace_all("\n ", "\n") %>%
str_split("\n") %>%
unlist() %>%
.[.!=""]
table_list[[i]] <- the_table
}
final_table <- Reduce(full_join,table_list)
CodePudding user response:
You can do two things:
- Ensure the correct column format upstream (i.e. in your
for
loop when scraping & parsing data), or - Fix things downstream (i.e. after scraping & parsing data).
I recommend option 1 as it's always better to fix things upstream when possible. I have included in-line comments on the two additional lines I added.
for (i in 1:16) {
page <- read_html(my_links[i])
the_table <- page %>% html_table() %>% .[[1]] %>%
filter(X1 != "") %>%
# Replace "-" with NA
mutate(across(everything(), ~ na_if(.x, "-"))) %>%
# Use `readr::parse_guess` to guess column format for char cols
mutate(across(where(is.character), parse_guess))
colnames(the_table) <- page %>% html_element(xpath = "//thead") %>%
html_text() %>%
str_replace_all("\n ", "\n") %>%
str_split("\n") %>%
unlist() %>%
.[.!=""]
table_list[[i]] <- the_table
}
Note: I'm using readr::parse_guess()
here instead of as.integer()
to be more error-proof. This may or may not be necessary.