Home > Enterprise >  Extracting repeated class with rvest html_elements in R
Extracting repeated class with rvest html_elements in R

Time:07-01

how are you? I am trying to extract some info about this sportbetting webpage using rvest. I asked a related question a few days ago and i get almost 100% of my goals. So far , and thanks to you, extracted succesfully the title, the score and the time of the matches being played using the next code:

library(rvest)
library(tidyverse)

page <- "https://www.supermatch.com.uy/live_recargar_menu/" %>%
  read_html()

data=data.frame(
  Titulo = page %>%
    html_elements(".titulo") %>%
    html_text(),
  Marcador = page %>%
    html_elements(".marcador") %>%
    html_text(), 
  Tiempo = page %>%
    html_elements(".marcador  span") %>%
    html_text() %>% 
    str_squish()
  
) 

Now i want to get repeated values, for example if the country of the match is "Brasil" I want to put it in the data frame that the country is Brasil for every match in that category. So far i only managed to extract all the countries but individually. Same applies for sport name and tournament.

Can you help me with that? Already thanks.

CodePudding user response:

You could re-write your code to use separate functions that work with different levels of information. These can be called in a nested fashion making the code easier to read.

Essentially, using nested map_dfr() calls to produce a single dataframe from functions working with lists at different levels within the DOM.

Below, you could think of it like an outer loop of sports, then an intermediate loop over countries, and an innermost loop over events within a sport and country.

library(rvest)
library(tidyverse)

get_sport_info <- function(sport) {
  df <- map_dfr(sport %>% html_elements(".category"), get_play_info)
  df$sport <- sport %>%
    html_element(".sport-name") %>%
    html_text()
  return(df)
}


get_play_info <- function(play) {
  df <- map_dfr(play %>% html_elements(".event"), ~
    data.frame(
      titulo = .x %>% html_element(".titulo") %>% html_text(),
      marcador = .x %>% html_element(".marcador") %>% html_text(),
      tiempo = .x %>% html_element(".marcador   span") %>% html_text() %>% str_squish()
    ))
  df$country <- play %>%
    html_element(".category-name") %>%
    html_text()
  return(df)
}


page <- "https://www.supermatch.com.uy/live_recargar_menu/" %>% read_html()
sports <- page %>% html_elements(".sport")
final <- map_dfr(sports, get_sport_info)
  • Related