<a class="image teaser-image ng-star-inserted" target="_self" href="/politik/inland/neuwahlen-2022-welche-szenarien-jetzt-realistisch-sind/401773131">
I just want to extract the "href" (for example the upper HTML tag) in order to concat it with the domain name of this website "https://kurier.at" and web scrape all articles on the home page.
I tried the following code
library(rvest)
library(lubridate)
kurier_wbpg <- read_html("https://kurier.at")
# I just want the "a" tags which come with the attribute "_self"
articleLinks <- kurier_wbpg %>% html_elements("a")%>%
html_elements(css = "tag[attribute=_self]") %>%
html_attr("href")%>%
paste("https://kurier.at",.,sep = "")
When I execute up to the html_attr("href") part of the above code block, the result I get is
character(0)
I think something wrong with selecting the HTML element tag. I need some help with this?
CodePudding user response:
You need to narrow down your css to the second teaser block image which you can do by using the naming conventions of the classes. You can use url_absolute()
to add the domain.
library(rvest)
library(magrittr)
url <- 'https://kurier.at/'
result <- read_html(url) %>%
html_element('.teasers-2 .image') %>%
html_attr('href') %>%
url_absolute(url)
Same principle to get all teasers:
results <- read_html(url) %>%
html_elements('.teaser .image') %>%
html_attr('href') %>%
url_absolute(url)
Not sure if you want the bottom block of 5 included. If so, you can again use classes
articles <- read_html(url) %>%
html_elements('.teaser-title') %>%
html_attr('href') %>%
url_absolute(url)
CodePudding user response:
It works with xpath
-
library(rvest)
kurier_wbpg <- read_html("https://kurier.at")
articleLinks <- kurier_wbpg %>%
html_elements("a") %>%
html_elements(xpath = '//*[@target="_self"]') %>%
html_attr('href') %>%
paste0("https://kurier.at",.)
articleLinks
# [1] "https://kurier.at/plus"
# [2] "https://kurier.at/coronavirus"
# [3] "https://kurier.at/politik"
# [4] "https://kurier.at/politik/inland"
# [5] "https://kurier.at/politik/ausland"
#...
#...