Home > Software engineering >  Querying page and Scraping it using Sheets
Querying page and Scraping it using Sheets

Time:10-16

I wanna use Sheets to query pages from wikidata and scrape a specific section but I couldn't find anything focused specifically on this, and since I'm a beginner in this I don't know really where to start from. So, I have a list of Q identifiers and I'd like to use them to query the page, and then check if there's a specific section there (or scrape the data from it if possible) otherwise return false. I started with what I found enter image description here

or:

=QUERY(IMPORTXML("https://www.wikidata.org/wiki/"&A1, "//*"), 
 "select Col2 where Col1 = 'date of death' and Col2 is not null")<>""

and for no match:

=IFERROR(QUERY(IMPORTXML("https://www.wikidata.org/wiki/"&A1, "//*"), 
 "select Col2 where Col1 = 'date of death' and Col2 is not null"), FALSE)

enter image description here


=IFERROR(REGEXEXTRACT(QUERY(IMPORTXML("wikidata.org/wiki/"&A2, "//*"), 
 "select Col2 where Col1 = 'date of birth' and Col2 is not null"), 
 "(.*) \d.*reference.*"), FALSE)
  • Related