I'm trying to collect some data on a couple of UFC fighters. When I use rvest to read the url it always returns a different url's data. Instead of Rob Font I get random fighters. The URL is directly accessible through a browser. Is this an anti-scraping tactic from the site or am I missing something obvious? Thanks.
#library(rvest)
#library(tidyverse)
url=read_html("https://www.tapology.com/fightcenter/fighters/rob-font")
name= url %>% html_nodes("div#stats.details.details_two_columns") %>% html_nodes('span') %>% html_text()
CodePudding user response:
I don't get the same error though with your code?
library(rvest)
library(tidyverse)
ufc <- "https://www.tapology.com/fightcenter/fighters/rob-font" %>%
read_html()
tibble(
detail = ufc %>% html_elements("strong:nth-child(1)") %>%
html_text2() %>%
.[1:14] %>%
str_replace_all(":", ""),
value = ufc %>%
html_elements("#stats span:nth-child(2)") %>%
html_text2()
)
# A tibble: 14 x 2
detail value
<chr> <chr>
1 Given Name "Robert Font"
2 Pro MMA Record "19-5-0 (Win-Loss-Draw)"
3 Nickname "N/A"
4 Current Streak "1 Loss"
5 Age "1987-06-25"
6 Last Fight "December 04, 2021"
7 Weight Class "Bantamweight"
8 Affiliation "New England Cartel"
9 Height "5'8\" (173cm)"
10 Career Disclosed Earnings "$493,000 USD"
11 Born "Tampa, Florida"
12 Fighting out of "Boston, Massachusetts"
13 Head Coach "Tyson Chartier"
14 Other Coaches "N/A"