New to webscraping. I am trying to scrape specific data from websites.
For eg. https://www.vesselfinder.com/vessels/KOTA-CARUM-IMO-9494577-MMSI-563150100
I need to scrape the distance the ship has travelled in 2020 and 2021.
shipws <- read_html(shipsite)
The above code gets me the site. shipsite is the url.
Now, I tried using,
a <- shipws %>%
html_nodes( css = "_1hFrZ") %>%
html_attr()
But it returns a empty. _1hFrZ was the td class in the website. It returns empty when I use html_text() too.
a <- shipsite %>%
html() %>%
html_nodes(xpath='//*[@id="tbc1"]/div[1]/div[1]/table') %>%
html_table()
Few tutorials asked me to do it above way and that turned up with errors that html() function does not exist. If I remove html()
Would love to know where I am going wrong. Thank you.
CodePudding user response:
We can just get all the tables from website by,
df = 'https://www.vesselfinder.com/vessels/KOTA-CARUM-IMO-9494577-MMSI-563150100' %>%
read_html() %>% html_table()
The table of interest is,
df[[2]]
# A tibble: 4 x 2
X1 X2
<chr> <int>
1 Travelled distance (nm) 98985
2 Port Calls 54
3 Average / Max Speed (kn) NA
4 Min / Max Draught (m) NA