Home > Back-end >  Web Scraping Earnings Calendar
Web Scraping Earnings Calendar

Time:01-26

  url <- "https://finance.yahoo.com/calendar/earnings?from=2022-12-04&to=2022-12-10&day=2022-12-06"

download_table <- function(url) {
  url_file <- GET(url)
  web_page_parsed <- htmlParse(url_file)
  tables <- readHTMLTable(web_page_parsed)
}

url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
print(head(tables))

I used this one for yahoo and it worked. But I tried this for:

url <- "https://www.benzinga.com/calendars/earnings"

download_table <- function(url) {
  url_file <- GET(url)
  web_page_parsed <- htmlParse(url_file)
  tables <- readHTMLTable(web_page_parsed)
}

url_file <- GET(url)
web_page_parsed <- htmlParse(url_file)
tables <- readHTMLTable(web_page_parsed)
print(head(tables))
tables$`NULL`

And I got no tables as result but this:

> print(head(tables))
$`NULL`
  Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
1 Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
  Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
1 Prior Rev Est Rev Actual Rev Rev Surprise Get Alert

$`NULL`
  V1   V2   V3   V4   V5   V6   V7   V8   V9  V10  V11  V12  V13
1                                                               
2    <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>

> tables$`NULL`
  Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
1 Date time ticker Quarter Prior EPS Est EPS Actual EPS EPS Surprise
  Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
1 Prior Rev Est Rev Actual Rev Rev Surprise Get Alert
> 

If i search in the source code for example the tickers I cant find them. So I cant use the rvest package to scrap them.

Has anyone a idea how to do this with benzinga?

Thank you and KR

Web Scraping Bezinga Earnings Calender with rvest and httpr

CodePudding user response:

The data is pulled from an API that you can see in the network section (inspect element in the developer tools).

The link is as follows:

https://api.benzinga.com/api/v2.1/calendar/earnings?token=1c2735820e984715bc4081264135cb90&parameters[date_from]=2023-01-25&parameters[date_to]=2023-01-25&parameters[tickers]=&pagesize=1000

You can then create a function that alter the dates and filter for the tickers ([tickers]) of interest. I wrote on here as a suggestion with httr2 where the function takes from_date and to_date as input.

library(tidyverse)
library(httr2)

get_earnings <- function(from_date, to_date) {
  str_c(
    "https://api.benzinga.com/api/v2.1/calendar/earnings?token=1c2735820e984715bc4081264135cb90&parameters[date_from]=",
    from_date,
    "&parameters[date_to]=",
    to_date,
    "&parameters[tickers]=&pagesize=1000"
  ) %>%
    request() %>%
    req_headers(accept = "application/json") %>%
    req_perform() %>%
    resp_body_json(simplifyVector = TRUE) %>%
    pluck("earnings") %>%
    as_tibble() %>%
    type_convert()
}

get_earnings(from_date = "2023-01-01", to_date = "2023-01-25")

# A tibble: 387 × 25
   currency date       date_confirmed   eps eps_est eps_prior eps_surprise eps_surprise_per…
   <chr>    <date>              <int> <dbl>   <dbl>     <dbl>        <dbl>             <dbl>
 1 USD      2023-01-25              1  0.91    0.58      0.57         0.33            0.569 
 2 USD      2023-01-25              1 NA       1.27      1.42        NA              NA     
 3 USD      2023-01-25              1  1       0.97      0.92         0.03            0.0309
 4 USD      2023-01-25              1  1.01    1.13      0.95        -0.12           -0.106 
 5 USD      2023-01-25              1  0.69   NA         0.93        NA              NA     
 6 USD      2023-01-25              1  0.12    0.13      0.16        -0.01           -0.0769
 7 USD      2023-01-25              1  1.5     1.43      1.05         0.07            0.049 
 8 USD      2023-01-25              1  1.1     0.98      0.69         0.12            0.122 
 9 USD      2023-01-25              1  0.02    0.01     -0.65         0.01            1     
10 USD      2023-01-25              1  0.42    0.44      0.5         -0.02           -0.0455
# … with 377 more rows, and 17 more variables: eps_type <chr>, exchange <chr>, id <chr>,
#   importance <int>, name <chr>, notes <chr>, period <chr>, period_year <int>,
#   revenue <dbl>, revenue_est <dbl>, revenue_prior <dbl>, revenue_surprise <dbl>,
#   revenue_surprise_percent <dbl>, revenue_type <chr>, ticker <chr>, time <time>,
#   updated <int>
  • Related