Home > front end >  Regexpmatch in excel - 5 characters Match
Regexpmatch in excel - 5 characters Match

Time:12-30

I am in need of help finding 5 character patterns between 2 cells in the same worksheet.

I found information online to set up an example code and I thought I would be able to tweak it to fix but is not working. Can anyone help me?

Here is what I am hoping to achieve:

Here is the formula I put into column C:

=Regexpmatch(A1:B1,"^[\S]{5}")

And the code in Visual Basic: Module1(Code)

    Public Function RegExpMatch(input_range As Range, pattern As String, Optional match_case As Boolean =    True) As Variant
  Dim arRes() As Variant 'array to store the results
  Dim iInputCurRow, iInputCurCol, cntInputRows, cntInputCols As Long 'index of the current row in the source range, index of the current column in the source range, count of rows, count of columns

  On Error GoTo ErrHandl

      RegExpMatch = arRes

      Set regEx = CreateObject("VBScript.RegExp")
      regEx.pattern = pattern
      regEx.Global = True
      regEx.MultiLine = True
      If True = match_case Then
        regEx.ignorecase = False
      Else
        regEx.ignorecase = True
      End If

      cntInputRows = input_range.Rows.Count
      cntInputCols = input_range.Columns.Count
      ReDim arRes(1 To cntInputRows, 1 To cntInputCols)

      For iInputCurRow = 1 To cntInputRows
        For iInputCurCol = 1 To cntInputCols
      arRes(iInputCurRow, iInputCurCol) = regEx.Test(input_range.Cells(iInputCurRow, iInputCurCol).Value)
    Next
      Next

      RegExpMatch = arRes
  Exit Function
    ErrHandl:
    RegExpMatch = CVErr(xlErrValue)
    End Function
    Sub Run()

    End Sub

I put this formula into Column C and received results in both Columns C and D. However, I cannot tell what it is even pulling as all the values in Column C are TRUE and I see no pattern or reason to why I received the FALSEs where I did.

CodePudding user response:

Your regex pattern will only return a match based on the first five non-space characters in the string.

It seems to me that what you really want to do is return TRUE if there are matching Words in the two strings, and if those words are five or more characters in length.

If that is not the case, please clarify.

  • Split each string into a list or array of Words
    • For Column1, a Word is separated by @, ., or the transition from lower case to upper case letters.
    • For Column 2, a Word is separated by a Space
  • Filter each list to only retain words containing five or more characters
  • Check to see if a word is present in both lists.

This can be done using VBA and/or Power Query.

Here is a Power Query solution: Power Query is available in Windows Excel 2010 and Excel 365 (Windows or Mac)

To use Power Query

  • Select some cell in your Data Table
  • Data => Get&Transform => from Table/Range
  • When the PQ Editor opens: Home => Advanced Editor
  • Make note of the Table Name in Line 2
  • Paste the M Code below in place of what you see
  • Change the Table name in line 2 back to what was generated originally.
  • Read the comments and explore the Applied Steps to understand the algorithm

M Code

let

//Change next line to reflect actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table21"]}[Content],

//Set the data types
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),

//Add custom column
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Match", each 
        let 

/*Split column 1 by 
  Transition from lower case to upper case character.
  Then by `@` or `.`
    Filter to include only words with five or more characters*/
            #"Split Col1" = 
                List.Select(
                    List.Combine(
                        List.Transform(
                            Splitter.SplitTextByCharacterTransition({"a".."z"},{"A".."Z"})([Column1]), 
                            each Text.SplitAny(_,"@."))), each Text.Length(_)>=5),

/*Split Column 2 by <space>
    Filter to include only words with length >=5*/
            #"Split Col2" = 
                List.Select(
                    Text.Split([Column2]," "), 
                    each Text.Length(_)>=5),

/*Create a List of words that are in both of the above lists
   If there are one or more words in the Intersection of the two lists
   then True, else False*/
            Match = 
                List.Intersect(
                    {#"Split Col1",#"Split Col2"},Comparer.OrdinalIgnoreCase)
        in 
            List.Count(Match) > 0, type logical)
in
    #"Added Custom"

enter image description here

CodePudding user response:

Seems like you could try:

enter image description here

Formula in C1:

=LET(x,TEXTSPLIT(B1," "),SUM(IFERROR(SEARCH(LEFT(FILTER(x,LEN(x)>4),5),A1),))>0)
  • Related