Home > Mobile >  rvest read_html returns data from a different url
rvest read_html returns data from a different url

Time:05-01

I'm trying to collect some data on a couple of UFC fighters. When I use rvest to read the url it always returns a different url's data. Instead of Rob Font I get random fighters. The URL is directly accessible through a browser. Is this an anti-scraping tactic from the site or am I missing something obvious? Thanks.

#library(rvest)
#library(tidyverse)
url=read_html("https://www.tapology.com/fightcenter/fighters/rob-font")
name= url %>% html_nodes("div#stats.details.details_two_columns") %>% html_nodes('span') %>% html_text()

CodePudding user response:

I don't get the same error though with your code?

library(rvest)
library(tidyverse)

ufc <- "https://www.tapology.com/fightcenter/fighters/rob-font" %>% 
  read_html() 

tibble(
  detail = ufc %>% html_elements("strong:nth-child(1)") %>% 
    html_text2() %>%  
    .[1:14] %>% 
    str_replace_all(":", ""), 
  value = ufc %>% 
    html_elements("#stats span:nth-child(2)") %>%  
    html_text2()
)

   # A tibble: 14 x 2
   detail                    value                   
   <chr>                     <chr>                   
 1 Given Name                "Robert Font"           
 2 Pro MMA Record            "19-5-0 (Win-Loss-Draw)"
 3 Nickname                  "N/A"                   
 4 Current Streak            "1 Loss"                
 5 Age                       "1987-06-25"            
 6 Last Fight                "December 04, 2021"     
 7 Weight Class              "Bantamweight"          
 8 Affiliation               "New England Cartel"    
 9 Height                    "5'8\" (173cm)"         
10 Career Disclosed Earnings "$493,000 USD"          
11 Born                      "Tampa, Florida"        
12 Fighting out of           "Boston, Massachusetts" 
13 Head Coach                "Tyson Chartier"        
14 Other Coaches             "N/A"   
  • Related