Home > OS >  Include Multiple Search Terms in an HTML Request
Include Multiple Search Terms in an HTML Request

Time:08-28

I found this post over here that shows how to search for news articles on Google using R:Scraping Google News with Rvest for Keywords

This post shows how to search for a single term, for example: keyword <- "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en"

  • Can this above code be modified to search for multiple terms? For example, suppose if I want to search for news articles that contain BOTH "iphone" and "covid":

Could I write the query like this?

library(tidyRSS)

#I have feeling that "IN" stands for "India" - if I want to change this to "Canada", I think I need to replace "IN" with "CAN"?

keyword <- "https://news.google.com/rss/search?q=apple&q=covid&hl=en-IN&gl=IN&ceid=IN:en"

# From the package vignette

google_news <- tidyfeed(
    keyword,
    clean_tags = TRUE,
    parse_dates = TRUE
)

Is this correct?

Thank you!

PS: I wonder if there is a way to restrict the dates between which the search will be performed?

CodePudding user response:

For multiple items, if we want either of them use OR or if both needs to be present use AND. Similarly, the hl stands for language, and gl for country. In addition, for date ranges, use keyword before/after

library(tidyRSS)
keyword <- "https://news.google.com/rss/search?q=apple AND covid after:2022-07-01 before:2022-08-02&hl=en-US&gl=US&ceid=US:en"
google_news <- tidyfeed(
    keyword,
    clean_tags = TRUE,
    parse_dates = TRUE
)

-checking for the date ranges

library(dplyr)
> all(between(as.Date(google_news$feed_pub_date), 
   as.Date("2022-07-01"), as.Date("2022-08-02")))
[1] TRUE
  • Related