Home > Software design >  Learning web scraping.. need some clarity on xpath="/html/body/div[3]/div[3]/div[4]/div/table[5
Learning web scraping.. need some clarity on xpath="/html/body/div[3]/div[3]/div[4]/div/table[5

Time:12-07

I am learning web scraping in r , and understand the HTML code.. but there is slightly some confusion here...

CODE 1 :

url <- "https://en.wikipedia.org/wiki/World_population"
ten_most_df <- read_html(url) 


ten_most_populous <- ten_most_df %>% 
  html_table() %>%
  .[[6]] 

CODE 2 :

url <- "https://en.wikipedia.org/wiki/World_population"
    ten_most_df <- read_html(url)




ten_most_populous <- ten_most_df %>% 
  html_nodes(xpath="/html/body/div[3]/div[3]/div[4]/div/table[5]") %>% html_table()

Are the methods use in code 1 and 2 the same as in code 1 , we are scraping the 6 node , however things are not clear to me Code 2 , as div[3] repeated twice. Can you please give some clarity on this. will be of great help.. thanks.

CodePudding user response:

body/div[3]/div[3]/div[4] means the 4th div child of the 3rd div child of the 3rd div child of the body element.

You really should be finding that out by reading a reference book on XPath, not by asking on StackOverflow.

  • Related