Home > Software engineering >  Extracting links from a drop down menu using rvest/RSelenium
Extracting links from a drop down menu using rvest/RSelenium

Time:04-13

I am trying to scrape a list of place names from a dropdown box. (The dropdown box corresponds to "Zona" in the following link ( https://www.fotocasa.es/es/comprar/viviendas/a-coruna-provincia/todas-las-zonas/l )

I can read in the html data using:

url = 'https://www.fotocasa.es/es/comprar/viviendas/a-coruna-provincia/todas-las-zonas/l'
html_full_page = url %>% 
  read_html()

However, I cannot seem to find the correct xpath or the correct nodes.

html_full_page %>% 
  html_nodes(xpath = '//*[@id="App"]/div[2]/div/div[2]/div[3]/div/div[1]')

I thought the nodes would correspond to the following:

re-GeographicSearchNext-linksList-items

But I cannot seem to collect this data.

What I would like to collect is a small data frame for the dropdown menu.

Place Name    Properties    URL
A bARCALA        69         /es/comprar/viviendas/a-coruna-provincia/a-barcala/l
Arzùa           105         /es/comprar/viviendas/a-coruna-provincia/arzua/l
Barbanza        636         /es/comprar/viviendas/a-coruna-provincia/barbanza/l

Copy element from webpage:

> <div ><a
>  title="A Barcala"
> href="/es/comprar/viviendas/a-coruna-provincia/a-barcala/l"><span
> >A Barcala</span><span
> >69</span></a><a
>  title="Arzúa"
> href="/es/comprar/viviendas/a-coruna-provincia/arzua/l"><span
> >Arzúa</span><span
> >105</span></a><a
>  title="Barbanza"
> href="/es/comprar/viviendas/a-coruna-provincia/barbanza/l"><span
> >Barbanza</span><span
> >636</span></a><a
>  title="Bergantiños"
> href="/es/comprar/viviendas/a-coruna-provincia/bergantinos/l"><span
> >Bergantiños</span><span
> >581</span></a><a
>  title="Comarca de A Coruña"
> href="/es/comprar/viviendas/a-coruna-provincia/comarca-de-a-coruna/l"><span
> >Comarca de A
> Coruña</span><span
> >3.701</span></a><a
>  title="Comarca de Betanzos"
> href="/es/comprar/viviendas/a-coruna-provincia/comarca-de-betanzos/l"><span
> >Comarca de
> Betanzos</span><span
> >715</span></a><a
>  title="Comarca de Ferrol"
> href="/es/comprar/viviendas/a-coruna-provincia/comarca-de-ferrol/l"><span
> >Comarca de
> Ferrol</span><span
> >3.698</span></a><a
>  title="Comarca de Santiago"
> href="/es/comprar/viviendas/a-coruna-provincia/comarca-de-santiago/l"><span
> >Comarca de
> Santiago</span><span
> >1.910</span></a><a
>  title="Eume"
> href="/es/comprar/viviendas/a-coruna-provincia/eume/l"><span
> >Eume</span><span
> >282</span></a><a
>  title="Fisterra"
> href="/es/comprar/viviendas/a-coruna-provincia/fisterra/l"><span
> >Fisterra</span><span
> >198</span></a><a
>  title="Muros"
> href="/es/comprar/viviendas/a-coruna-provincia/muros/l"><span
> >Muros</span><span
> >78</span></a><a
>  title="Noia"
> href="/es/comprar/viviendas/a-coruna-provincia/noia/l"><span
> >Noia</span><span
> >192</span></a><a
>  title="O Sar"
> href="/es/comprar/viviendas/a-coruna-provincia/o-sar/l"><span
> >O Sar</span><span
> >79</span></a><a
>  title="Ordes"
> href="/es/comprar/viviendas/a-coruna-provincia/ordes/l"><span
> >Ordes</span><span
> >128</span></a><a
>  title="Ortegal"
> href="/es/comprar/viviendas/a-coruna-provincia/ortegal/l"><span
> >Ortegal</span><span
> >248</span></a><a
>  title="Terra de Melide"
> href="/es/comprar/viviendas/a-coruna-provincia/terra-de-melide/l"><span
> >Terra de
> Melide</span><span
> >126</span></a><a
>  title="Terra de Soneira"
> href="/es/comprar/viviendas/a-coruna-provincia/terra-de-soneira/l"><span
> >Terra de
> Soneira</span><span
> >68</span></a><a
>  title="Xallas"
> href="/es/comprar/viviendas/a-coruna-provincia/xallas/l"><span
> >Xallas</span><span
> >20</span></a></div>

CodePudding user response:

We can get the data from dropdown menu data first clicking on element and then extracting required info,

library(RSelenium)
library(rvest)
rD <- rsDriver(browser="firefox", port=4536L)
remDr <- rD[["client"]]
#navigate 
url = 'https://www.fotocasa.es/es/comprar/viviendas/a-coruna-provincia/todas-las-zonas/l'
remDr$navigate(url)
#click on Zona
remDr$findElement(using = "xpath",'//*[@id="save-results-panel-trigger"]')$clickElement()


html_full_page = remDr$getPageSource()[[1]] %>% read_html() 

names = html_full_page %>% 
  html_nodes('.re-GeographicSearchNext-linkItem') %>% 
  html_text()
  
number = html_full_page %>% 
  html_nodes('.re-GeographicSearchNext-linkItem') %>% html_nodes('.re-GeographicSearchNext-linkItem-count') %>% 
  html_text()

link  = html_full_page %>% 
  html_nodes('.re-GeographicSearchNext-linkItem') %>% 
  html_attr('href')

df = cbind.data.frame(names, number, link)

                      names number                                                           link
1               A Barcala69     69           /es/comprar/viviendas/a-coruna-provincia/a-barcala/l
2                  Arzúa105    105               /es/comprar/viviendas/a-coruna-provincia/arzua/l
3               Barbanza636    636            /es/comprar/viviendas/a-coruna-provincia/barbanza/l
4            Bergantiños581    581         /es/comprar/viviendas/a-coruna-provincia/bergantinos/l
  • Related