Website https://www.moe.gov.sg/schoolfinder/schooldetail?schoolname=ZHONGHUA-SECONDARY-SCHOOL
I only want to extract information under the DSA talent areas offered in 2021
However, when I use selector gadget get the path .is--open:nth-child(4) .moe-collapsible__content
dsa <- html_node(listpage,".is--open:nth-child(4) .moe-collapsible__content") %>% html_text() %>% unlist()
dsa
the output is NA
is there any way to get information from the collapsible content?
CodePudding user response:
One way to do is,
library(rvest)
library(dplyr)
library(stringr)
'https://www.moe.gov.sg/schoolfinder/schooldetail?schoolname=ZHONGHUA-SECONDARY-SCHOOL' %>%
read_html() %>% html_nodes('.moe-collapsible__content') %>% html_nodes('.moe-list') %>% html_text() %>% nth(3) %>% str_split('\n')
[[1]]
[1] "Leadership and Character (Girls and Boys)\r"
[2] " Chinese Orchestra (Girls and Boys)\r"
[3] " Choir (Girls and Boys)\r"
[4] " Concert Band (Girls and Boys)\r"
[5] " Guzheng Ensemble (Girls and Boys)\r"
[6] " Badminton (Girls)\r"
[7] " Basketball (Girls)\r"
[8] " Table Tennis (Boys)\r"
[9] " Volleyball (Boys)\r"
CodePudding user response:
You can be more precise by using :contains with class to target the correct parent div then use a descendant selector to move to the child li elements. By using a partial string you may be able to offer some future proofing for 2022.
library(magrittr)
library(rvest)
read_html("https://www.moe.gov.sg/schoolfinder/schooldetail?schoolname=ZHONGHUA-SECONDARY-SCHOOL") %>%
html_elements('.moe-collapsible:contains("DSA talent areas") li') %>% html_text()