I have been scraping a table from this website succsessfully since the user hrbrmstr gave his answer to this question of mine 5 years ago. Lately something about the website changed and I can't fetch the data any longer.
URL <- "http://www.fiskistofa.is/veidar/aflaupplysingar/landanir-eftir-hofnum/"
library(httr)
library(rvest)
res <- POST(url = URL,
query = list(lang="is"),
body = list(magn = "Sundurlidun",
hofn = "87",
dagurFra = format(lubridate::today()-4, "%d.%m.%Y"),
dagurTil = format(lubridate::today(), "%d.%m.%Y"),
hnappur = "Sækja"),
encode = "form")
doc <- content(res, as="parsed")
This is how I used to be able to find and extract the table but now the output is empty:
html_nodes(doc, xpath=".//table[contains(., 'Magn')]") %>%
html_table(header=TRUE)
Nothing in the appearance of the site has changed but recently they opened up this Power BI (the table is on page nr. 3) for this database so they may have changed something in the meantime that I don't know about.
Any suggestions?
CodePudding user response:
Try changing the format in the dates to '%d.%m.%Y'
. And try changing the http://
to https://
URL <- "https://www.fiskistofa.is/veidar/aflaupplysingar/landanir-eftir-hofnum/"
library(httr)
library(rvest)
res <- POST(url = URL,
query = list(lang="is"),
body = list(magn = "Sundurlidun",
hofn = "87",
dagurFra = format(lubridate::today()-4, '%d.%m.%Y') ,
dagurTil = format(lubridate::today(), '%d.%m.%Y'),
hnappur = "Sækja"),
encode = "form")
doc <- content(res, as="parsed")
In Python:
import requests
import pandas as pd
from datetime import datetime, timedelta
url = "https://www.fiskistofa.is/veidar/aflaupplysingar/landanir-eftir-hofnum/"
today = datetime.now()
payload = {
'magn' : "Sundurlidun",
'hofn' : "87",
'dagurFra' : (today - timedelta(days=4)).strftime("%d.%m.%Y"),
'dagurTil' : today.strftime("%d.%m.%Y"),
'hnappur' : "Sækja"}
df = pd.read_html(requests.post(url, data=payload).text)[-1]
Output:
print(df)
0 1 ... 4 5
0 Löndun dags Skipnr. ... Vörutegund Magn
1 25.11.2021 2999 ... Steinbítur /slægður 5
2 25.11.2021 2999 ... ÝSA/ÓSL./VS (HAFRO) 690
3 25.11.2021 2999 ... Ýsa /óslægð 415
4 25.11.2021 2999 ... ÞORSKUR/ÓSL./VS (HAFRO) 861
5 25.11.2021 2999 ... Þorskur / óslægður 4.870
6 26.11.2021 2615 ... ÝSA/ÓSL./VS (HAFRO) 14
7 26.11.2021 2615 ... Ýsa /óslægð 1.005
8 26.11.2021 2615 ... ÞORSKUR/ÓSL./VS (HAFRO) 164
9 26.11.2021 2615 ... Þorskur / óslægður 1.507
10 27.11.2021 2842 ... ÞORSKUR/ÓSL./VS (HAFRO) 271
11 27.11.2021 2842 ... Þorskur / óslægður 5.703
12 27.11.2021 2842 ... Þorskur-undirmál/ósl 151
13 27.11.2021 2842 ... Hlýri /óslægður 13
14 27.11.2021 2842 ... Gullkarfi 27
15 27.11.2021 2842 ... Ufsi /óslægður 29
16 27.11.2021 2842 ... Keila /óslægð 11
17 27.11.2021 2842 ... Lýsa /óslægð 2
18 27.11.2021 2842 ... Ýsa /óslægð 3.072
19 27.11.2021 2842 ... Ýsa-undirmál/óslægð 8
20 28.11.2021 2256 ... Ýsa /óslægð 1.888
21 28.11.2021 2256 ... Þorskur-undirmál/ósl 551
22 28.11.2021 2256 ... Þorskur / óslægður 4.212
23 28.11.2021 2256 ... Steinbítur /slægður 4
24 28.11.2021 2256 ... ÝSA/ÓSL./VS (HAFRO) 243
25 28.11.2021 2615 ... Ýsa /óslægð 829
26 28.11.2021 2615 ... Þorskur / óslægður 2.659
27 28.11.2021 2615 ... Gullkarfi 34
28 28.11.2021 2842 ... Keila /óslægð 11
29 28.11.2021 2842 ... Gullkarfi 18
30 28.11.2021 2842 ... ÞORSKUR/ÓSL./VS (HAFRO) 95
31 28.11.2021 2842 ... Hlýri /óslægður 17
32 28.11.2021 2842 ... Þorskur-undirmál/ósl 79
33 28.11.2021 2842 ... Langa /óslægð 18
34 29.11.2021 1136 ... Tindabikkja 599
35 29.11.2021 1136 ... Þorsklifur 1.787
[36 rows x 6 columns]