url <- "https://finance.yahoo.com/calendar/earnings?from=2022-12-04&to=2022-12-10&day=2022-12-06"
download_table <- function(url) {
url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
}
url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
print(head(tables))
I used this one for yahoo and it worked. But I tried this for:
url <- "https://www.benzinga.com/calendars/earnings"
download_table <- function(url) {
url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
}
url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
print(head(tables))
tables$`NULL`
And I got no tables as result but this:
> print(head(tables))
$`NULL`
Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
1 Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
1 Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
$`NULL`
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1
2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
> tables$`NULL`
Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
1 Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
1 Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
>
If i search in the source code for example the tickers I cant find them. So I cant use the rvest package to scrap them.
Has anyone a idea how to do this with benzinga?
Thank you and KR
Web Scraping Bezinga Earnings Calender with rvest and httpr
CodePudding user response:
The data is pulled from an API that you can see in the network section (inspect element in the developer tools).
The link is as follows:
You can then create a function that alter the dates and filter for the tickers ([tickers]
) of interest. I wrote on here as a suggestion with httr2
where the function takes from_date
and to_date
as input.
library(tidyverse)
library(httr2)
get_earnings <- function(from_date, to_date) {
str_c(
"https://api.benzinga.com/api/v2.1/calendar/earnings?token=1c2735820e984715bc4081264135cb90¶meters[date_from]=",
from_date,
"¶meters[date_to]=",
to_date,
"¶meters[tickers]=&pagesize=1000"
) %>%
request() %>%
req_headers(accept = "application/json") %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
pluck("earnings") %>%
as_tibble() %>%
type_convert()
}
get_earnings(from_date = "2023-01-01", to_date = "2023-01-25")
# A tibble: 387 × 25
currency date date_confirmed eps eps_est eps_prior eps_surprise eps_surprise_per…
<chr> <date> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 USD 2023-01-25 1 0.91 0.58 0.57 0.33 0.569
2 USD 2023-01-25 1 NA 1.27 1.42 NA NA
3 USD 2023-01-25 1 1 0.97 0.92 0.03 0.0309
4 USD 2023-01-25 1 1.01 1.13 0.95 -0.12 -0.106
5 USD 2023-01-25 1 0.69 NA 0.93 NA NA
6 USD 2023-01-25 1 0.12 0.13 0.16 -0.01 -0.0769
7 USD 2023-01-25 1 1.5 1.43 1.05 0.07 0.049
8 USD 2023-01-25 1 1.1 0.98 0.69 0.12 0.122
9 USD 2023-01-25 1 0.02 0.01 -0.65 0.01 1
10 USD 2023-01-25 1 0.42 0.44 0.5 -0.02 -0.0455
# … with 377 more rows, and 17 more variables: eps_type <chr>, exchange <chr>, id <chr>,
# importance <int>, name <chr>, notes <chr>, period <chr>, period_year <int>,
# revenue <dbl>, revenue_est <dbl>, revenue_prior <dbl>, revenue_surprise <dbl>,
# revenue_surprise_percent <dbl>, revenue_type <chr>, ticker <chr>, time <time>,
# updated <int>