Home > other >  Importxml formula to help scraping from web solving paging problem
Importxml formula to help scraping from web solving paging problem

Time:09-23

The site i am trying to scrape from is a marketplace with a number of pages with output results. When scraping the importxml formula is valid for the defined page (e.g. enter image description here

CodePudding user response:

use:

=INDEX(QUERY(REGEXREPLACE(HLOOKUP("×", {"×"; QUERY(FLATTEN(IFERROR(SPLIT(FLATTEN({
 IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=1", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""}); 
 IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=2", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""}); 
 IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=3", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""}); 
 IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=4", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""}); 
 IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=5", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""})}), ", ", ))), 
 "where Col1 is not null and not Col1 ends with 'VIN' and not Col1 contains 'ТОП'", )}, 
 SEQUENCE(5*ROWS(B:B)/11, 11, 2), ), " г.| \$$|^≈ ", ), 
 "select Col1,Col3,Col8,Col5,Col6,Col7,Col10 where Col1 is not null and not Col1 = '#REF!'", ))

0

demo sheet

  • Related