Google Sheets has a cool function called importxml. It lets you scrape data from public web sources e.g. a URL's title.
You can write an XPath expression e.g. //title/text()
and it will find all occurrences of some value and display those in your Google Sheet as separate rows. Nifty!
If the contents of <element></element>
contain HTML markup e.g. <element>some<b>data</b>is<em>here</em>.
then Google Sheets gives you two options:
- if you append /text() at the end of your XPath then the result is the direct text value of the element e.g.
some here
- if you do not append /text() at the end of your XPath then the result is the entire text split across multiple cells in the row. You end up with
|some|data|is|here|
. (where | acts as my cell divider)
I am trying to get some data is here
in a single cell, not across multiple cells on my row. Is there a way to do that?
Using textjoin
or concatenate on importxml
doesn't work because it then applies it to the entire result and if there are multiple instances of <element>some<b>data</b>is<em>here</em>.
then I get them all in a single cell instead of one result per row (which is what I want).
CodePudding user response:
try:
=TEXTJOIN(" "; 1; your_importxml_here)
for array use:
=FLATTEN(QUERY(TRANSPOSE(your_importxml_here);;9^9))