I want to extract the references from an article on this page:
https://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S2448-76782022000100004&lang=es
I have tried this:
library(rvest)
library(dplyr)
product_names = simple %>%
html_nodes(xpath= '//*[contains(concat( " ", @class, " " ), concat( " ", "references", " " ))]') %>%
html_text()
but did not work
How can I extract the references?
CodePudding user response:
Here is a way.
The main complication is the presence of multi-byte characters at the end of each string.
suppressPackageStartupMessages({
library(rvest)
library(dplyr)
})
link <- "https://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S2448-76782022000100004&lang=es"
page <- read_html(link)
page %>%
html_elements(xpath = '//*[@id="article-back"]') %>%
html_elements("p") %>%
html_text() %>%
gsub("[\n\t]", "", .) %>%
gsub("\\[|\\]", "", .) %>%
gsub("Links", "", .) %>%
iconv(from = 'UTF-8', to = 'ASCII//TRANSLIT') %>%
trimws() -> refs
refs <- refs[3:70]
head(refs)
#> [1] "Alaie, S. A. (2020). Knowledge and learning in the horticultural innovation system: A case of Kashmir valley of India. International Journal of Innovation Studies, 4(1), 116-133. https://doi.org/10.1016/j.ijis.2020.06.002."
#> [2] "Andersson, U., Dasi, A., Mudambi, R., & Pedersen, T. (2016)Technology, innovation and knowledge: The importance of ideas and internationalconnectivity. Journal of World Business,51(1), 153-162.https://doi.org/10.1016/j.jwb.2015.08.017."
#> [3] "Arroyo, F. J., Sanchez, J., & Sole, M. L. (2017). La calidad e innovacion como factores de diferenciacion para el comercio electronico de ropa interior de una marca latinoamericana en Espana. Contabilidad y Negocios, 12(23), 52-61. h ttps://doi.org/10.18800/contabilidad.201701.004."
#> [4] "Bach, H., Makitie, T., Hansen, T., & Steen, M. (2021). Blending new and old in sustainability transitions: Technological alignment between fossil fuels and biofuels in Norwegian coastal shipping. Energy Research & Social Science, 74(1), 101957. https://doi.org/10.1016/j.erss.2021.101957."
#> [5] "Bodas, I. M., Marques, R. A.., & Silva, E. M. (2013). University-industry collaboration and innovation in emergent and mature industries in new industrialized countries. Research Policy, 42(2), 443-453. https://doi.org/10.1016/j.respol.2012.06.006."
#> [6] "Bourke, J., & Roper, S. (2017). Innovation, quality management and learning: Short-term and longer-term e?ects. Research Policy, 46(1), 1505-1518. https://doi.org/10.1016/j.respol.2017.07.005."
Created on 2022-10-21 with reprex v2.0.2