I found this post over here that shows how to search for news articles on Google using R:Scraping Google News with Rvest for Keywords
This post shows how to search for a single term, for example: keyword <- "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en
"
- Can this above code be modified to search for multiple terms? For example, suppose if I want to search for news articles that contain BOTH "iphone" and "covid":
Could I write the query like this?
library(tidyRSS)
#I have feeling that "IN" stands for "India" - if I want to change this to "Canada", I think I need to replace "IN" with "CAN"?
keyword <- "https://news.google.com/rss/search?q=apple&q=covid&hl=en-IN&gl=IN&ceid=IN:en"
# From the package vignette
google_news <- tidyfeed(
keyword,
clean_tags = TRUE,
parse_dates = TRUE
)
Is this correct?
Thank you!
PS: I wonder if there is a way to restrict the dates between which the search will be performed?
CodePudding user response:
For multiple items, if we want either of them use OR
or if both needs to be present use AND
. Similarly, the hl
stands for language
, and gl
for country
. In addition, for date ranges, use keyword before/after
library(tidyRSS)
keyword <- "https://news.google.com/rss/search?q=apple AND covid after:2022-07-01 before:2022-08-02&hl=en-US&gl=US&ceid=US:en"
google_news <- tidyfeed(
keyword,
clean_tags = TRUE,
parse_dates = TRUE
)
-checking for the date ranges
library(dplyr)
> all(between(as.Date(google_news$feed_pub_date),
as.Date("2022-07-01"), as.Date("2022-08-02")))
[1] TRUE