I am learning web scraping in r , written the following code :
url <- "https://en.wikipedia.org/wiki/World_population"
library(rvest)
library(tidyr)
library(dplyr)
ten_most_df <- read_html(url)
ten_most_populous <- ten_most_df %>%
html_nodes("table") %>% `[[`(6) %>% html_table()
In the above mentioned code, what does : [[
(6) represent.
I have referred some document as well for this where the following text is written, but not getting clearity on this :
"For vectors and matrices the [[ forms are rarely used, although they have some slight semantic differences from the [ form (e.g. it drops any names or dimnames attribute, and that partial matching is used for character indices)"
Request you to please explain on this , will be very helpful. thanks
CodePudding user response:
It's just one way of selecting the 6th element from the nodeset.
The code ten_most_df %>% html_nodes("table")
returns an xml_nodeset object with 26 elements, corresponding to the 26 tables on the page. [[(6)
subsets the object and returns the 6th node.
In fact there's a quicker way using only html_table
, which returns the tables in a list:
ten_most_df %>%
html_table() %>%
.[[6]]
Personally I find this a little easier to read; the .
represents the list and [[n]]
is the standard way to access list element number n
.