Home > database >  Webscraping using R - Table content
Webscraping using R - Table content

Time:03-11

New to webscraping. I am trying to scrape specific data from websites.

For eg. https://www.vesselfinder.com/vessels/KOTA-CARUM-IMO-9494577-MMSI-563150100

I need to scrape the distance the ship has travelled in 2020 and 2021.

shipws <- read_html(shipsite) 

The above code gets me the site. shipsite is the url.

Now, I tried using,

a <- shipws %>%
  html_nodes( css = "_1hFrZ") %>%
  html_attr()

But it returns a empty. _1hFrZ was the td class in the website. It returns empty when I use html_text() too.

a <- shipsite %>%
  html() %>%
  html_nodes(xpath='//*[@id="tbc1"]/div[1]/div[1]/table') %>%
  html_table()

Few tutorials asked me to do it above way and that turned up with errors that html() function does not exist. If I remove html()

Would love to know where I am going wrong. Thank you.

CodePudding user response:

We can just get all the tables from website by,

df = 'https://www.vesselfinder.com/vessels/KOTA-CARUM-IMO-9494577-MMSI-563150100' %>% 
  read_html() %>% html_table()

The table of interest is,

df[[2]]
# A tibble: 4 x 2
  X1                          X2
  <chr>                    <int>
1 Travelled distance (nm)  98985
2 Port Calls                  54
3 Average / Max Speed (kn)    NA
4 Min / Max Draught (m)       NA
  • Related