I know how to count a series of keywords in Excel. I use this formula:
=SUMPRODUCT(--ISNUMBER(SEARCH($CE$2:$CE$43,(G2:AP2))))
However, what would be the Excel formula if I want to count the number of keywords that exist only within /-3 words around "risk" in the selected rows?
Consider this sentence: "Political uncertainty generates economic risk which stagnates economic activities." If my keywords are political
, uncertainty
, stagnates
, and economic
, the total number of keywords within /- 3 words around "risk" will be 3, i.e., uncertainty
, stagnates
, and economic
. political
will be excluded since it is out of range.
CodePudding user response:
You may try the following formula if you have Microsoft-365 with most recent release.
=LET(x,TOCOL(TEXTSPLIT(A1," ")),y,MATCH("risk",x,0),z,INDEX(x,SEQUENCE(y 2,1,y-3)),COUNT(XMATCH(D1:D4,z,0)))
CodePudding user response:
You could try:
Formula in E1
:
=SUM(--ISNUMBER(MATCH(FILTERXML("<t><s>"&SUBSTITUTE("a a a "&A1," ","</s><s>")&"</s></t>","//s[following::s[4]='risk']/following::*[position()<8]")&"*",C1:C4,0)))
This would in order:
"<t><s>"&SUBSTITUTE("a a a "&A1," ","</s><s>")&"</s></t>"
- Create a valid xml-string to process with xpath;"//s[following::s[4]='risk']/following::*[position()<8]"
- A valid xpath 1.0 expression to retrieve only upto seven nodes withrisk
in the middle. In more detail://s[following::s[4]='risk']
will get the node where the following sibling on 4 indices to it's right will equalrisk
(case-sensitive), we then take all following siblings from that position with/following::*
where we limit the returned nodes to a max of seven with[position()<8]
. Now it would also make sense why we concatenatea a a
with the string fromA1
sincerisk
could appear at the start of the string or less then three words in;MATCH()
- Will then check if any of the returned nodes start with any of the words inC1:C4
using a wildcard. This is to prevent possible punctuation to avoid matching;SUM()
andISNUMBER()
prepended with double unary will lastly make a summation of hits.
Note: The answer is not 3 but 4! Since 'economic' is to be counted twice.