Home > OS >  Learning web scraping unable to understand : html_nodes("table") %>% `[[`(6) %>%
Learning web scraping unable to understand : html_nodes("table") %>% `[[`(6) %>%

Time:12-03

I am learning web scraping in r , written the following code :

url <- "https://en.wikipedia.org/wiki/World_population"  

library(rvest)
library(tidyr)
library(dplyr)

ten_most_df <- read_html(url) 



ten_most_populous <- ten_most_df %>% 
  html_nodes("table") %>% `[[`(6) %>% html_table()

In the above mentioned code, what does : [[(6) represent.

I have referred some document as well for this where the following text is written, but not getting clearity on this :

"For vectors and matrices the [[ forms are rarely used, although they have some slight semantic differences from the [ form (e.g. it drops any names or dimnames attribute, and that partial matching is used for character indices)"

Request you to please explain on this , will be very helpful. thanks

CodePudding user response:

It's just one way of selecting the 6th element from the nodeset.

The code ten_most_df %>% html_nodes("table") returns an xml_nodeset object with 26 elements, corresponding to the 26 tables on the page. [[(6) subsets the object and returns the 6th node.

In fact there's a quicker way using only html_table, which returns the tables in a list:

ten_most_df %>% 
  html_table() %>%
  .[[6]] 

Personally I find this a little easier to read; the . represents the list and [[n]] is the standard way to access list element number n.

  • Related