Home > Software design >  IMPORTXML without header, navigation and footer
IMPORTXML without header, navigation and footer

Time:11-10

I'm using =importxml("URL-TO-SCRAPE";"//html//body//text()") to scrape the text of urls. However, this way the content from the header, navigation and footer is also included. How can i exclude this?

CodePudding user response:

if it is in one cell you can either regex it or find a better path.

if the output is across multiple cells you can try query function with limit and offset parameters

https://developers.google.com/chart/interactive/docs/querylanguage

CodePudding user response:

You need to analyse the source code of URL-TO-SCRAPE to find the node that contains the text that you want to import. If the DOM is static (it was not modified by JavaScript) they you might use Chrome Dev Tools or similar to get the a better xPath.

  1. Right click the text that you want to import and select Inspect
    • This will open the Elements tab of the browser's dev tools
  2. Find the parent element that contains the text to be imported
  3. Right click the element and select Copy > xPath
  4. Adapt the xPath to be used in IMPORTXML and add it to the formula instead of the current xPath parameter.
  • Related