Home > Software engineering >  Search for a set of substrings within a given string
Search for a set of substrings within a given string

Time:10-18

I'm mostly trying to search for a range substrings within a given cell.

My ultimate objected is the red numbers in this example:

Example

I currently have something like:

=IF(ISNUMBER(SEARCH("COD",A2)),B2, 0) IF(ISNUMBER(SEARCH("HADDOCK",A2)),B2, 0) IF(ISNUMBER(SEARCH("SALMON",A2)),B2, 0)

But of course as the number of strings I search for increases it becomes awkward to maintain!

So instead of having a the strings 'hardcoded' into the equation it'd be better to reference a range of other cells with potential values in them.

I've been trying various combinations searching online with no luck.

I.e. I have tried things like the following with no luck:

{=IF(ISNUMBER(SEARCH(Fish,A2)),B2,0)}  (where fish is the range of cells)
(but this just  takes the first cell in the range call 'Fish')

=COUNTIF(rng,"*"&Fish&"*"))

CodePudding user response:

First, I would lay the search strings out in columns and not rows, and probably remove them to a separate sheet entirely.

enter image description here

Then, there are a few ways to do it. The basic idea is to take the array of strings from the column and wrap it in "*" so that the partial match will work. So for fish, you'll have something like:

="*" & H:H & "*"

The only problem here is that you will also have included the column header (fish) and any empty cells in your range will be represented in the array by "**", which is going to match anything. So you need to filter the array. If you have FILTER, that's easy enough, or you can :

="*" & FILTER(H:H, H:H <> "") & "*"

But you'll still have the column header. You can avoid that by specifying the range differently:

="*" & FILTER(H2:H10, H2:H10 <> "") & "*"

If you don't have filter, then you can still do it, but I'm going to lay out the rest of the idea and then go back to the alternate method. Once you have the filtered list of search terms, then you can use it with SUM COUNTIFS to get a match. Since you are passing COUNTIFS an array of values to match against, it will produce an array of matches, and so you pass it to SUM to reduce to a single value, something like this:

=SUM(COUNTIFS($A2, "*" & FILTER(H$2:H$10, H$2:H$10 <> "") & "*"))

I'm going to add the partial static qualifier to the ranges so that I can copy it around without having to rewrite the formula every time. This formula would give you either 1 or 0 if the value in column A matches anything in column H, and you can us that in an IF to get the value from column B:

=IF(SUM(COUNTIFS($A2, "*" & FILTER(H$2:H$10, H$2:H$10 <> "") & "*")), $B2, 0)

You can copy that into C2:E5, and you'll get the values you specified. If all you wanted was to get the sums, then you could use SUMIFS instead of COUNTIFS:

=SUM(SUMIFS($B:$B, $A:$A, "*" & FILTER(H$2:H$10, H$2:H$10 <> "") & "*"))

A note here, instead of passing a range to xxxIFS, you could use an array literal, and unless you have a large number of constantly changing values, that might be easier to read. Another advantage of the array literal is that it should be supported by all versions of excel, while FILTER won't be. An array literal would make the original formula look like this:

=IF(SUM(COUNTIFS($A2, "*" & {"cod", "haddock"} & "*")), $B2, 0)

Ok, so if you don't have FILTER, you'll have to do some array multiplication. You can do this with SUMPRODUCT, but I don't think it's necessary.

=IF(SUM(COUNTIFS($A2, "*" & H$2:H$10 & "*") * (H$2:H$10<>"") ), $B2, 0)

Here you're just multiplying the array generated by countifs by the array generated by <>"".

Also, there is a way to do it with SEARCH, and it's a preference which to use. You don't have to wrap the terms in "*", but you still have to filter the arrays somehow, so one of these:

=IF(SUM(--ISNUMBER(SEARCH(FILTER(H$2:H$10, H$2:H$10<>""), $A2))), $B2, 0)
=IF(SUM(--ISNUMBER(SEARCH(H$2:H$10, $A2))*(H$2:H$10<>"")), $B2, 0)

or with the array literal:

=IF(SUM(--ISNUMBER(SEARCH({"cod", "haddock"}, $A2))), $B2, 0)

So what we're doing here is using search with the array of values, then taking that array and passing it to ISNUMBER to generate an array of booleans, then coercing that to an array of 0s or 1s with --. Passing that array of 0/1s to SUM gives us a number, and if that number is >0 then it indicates a match.

Hopefully that wasn't too long winded. Let me know if anything was unclear or if you have any trouble.

Also, it would be possible to use IF to filter the match range, like this:

="*" & IF(H2:H10<>"", H2:H10) & "*"

as long as there is no chance that FALSE will ever match any of your full strings, because it will return something like this:

enter image description here

Also, FILTER will reduce the size of arrays involved, since it only returns the non-empty rows, and would likely be more performant, but unless you have a lot of data I doubt it will be noticeable.

CodePudding user response:

a little different approach. Use the following formula:

=IF(MAX(IFERROR(SEARCH( INDEX($B$8:$H$10,MATCH(C$1,$A$8:$A$10,0), ),$A2),0))>=1,$B2,"")

...as MATRIX (with ctrl shft Enter instead of Enter) and fill all empty cells in the db of fish/fruits/veg with something neutral (ex: -). For example cod,haddock, ...becomes... cod,haddock,-. Then Enter the formula in C1 then drag Down then all formulas to right...

I explain in parts:

{1} INDEX($B$8:$H$10,MATCH(C$1,$A$8:$A$10,0), ) ... it finds the items so that C1 matches a8:a10 (so fish gets {cod,haddock,-,-,-} ). Note that your demo is until column D but the formula is up to column H. Returns an array of items

{2} IFERROR(SEARCH( {1} ,$A2),0)) looks ony by one all the items from {2} in $A. Returns an array of 0 for not found or >=1 for found (ex {0,11,0,0,0})

{3} =IF(MAX( {2} >=1 ,$B2,"") ...if maximum of found is >=1, we have a match and return $b2, else return "" for no match.

Just remember to Enter the formula as MATRIX in ONE CELL, then Drag or COPY-Paste... cheers

  • Related