Home > Software design >  VBA code to find the longest set of characters among similarly starting ones , multiple times
VBA code to find the longest set of characters among similarly starting ones , multiple times

Time:10-05

I would like to find the longest set of characters among similarly starting ones in a huge pile of strings. I know this formula: {=INDEX(rng,MATCH(MAX(LEN(rng)),LEN(rng),0))} but here I need to define the range every single time where a new set of 6 characters will appear. I can retrieve the similarities with {=LEFT(cell,6)} but then I still have to define the range where the set of characters starts with the same 6.

This would be a nightmare since I have so much data. Starts like this:

blabla, 3, 4 
blabla, 3, 4, 5 
blabla, 3, 4, 5, 6 
abcdef, 3 
abcdef, 3, 4 
qwertz, 2, 3 
qwertz, 2, 3, 4 
qwertz, 2, 3, 4, 5

etc.

And from these, I only need

blabla, 3, 4, 5, 6 
abcdef, 3, 4 
qwertz, 2, 3, 4, 5 

[so the longest ones from the similarly starting ones]. Thank you for any help!

CodePudding user response:

if you have Excel 365 current channel you can use this formula:

=LET(unique, UNIQUE(LEFT(A1:A8,6)),
BYROW(unique,LAMBDA(r,TAKE(SORT(FILTER(A1:A8,LEFT(A1:A8,6)=r),1,-1),1))))

It first retrieves the unique parts (first 6 characters) Then per each unique prefix sorts the value descending and takes the top value - which is the longest one.

CodePudding user response:

The following code will assume your strings are found in Column A. It outputs the longest strings in Column B.

I didn't have time to comment the code yet or make it pretty, but basically it stores each string into a dictionary (key = left 6 characters, value = entire string).

If the dictionary doesn't have the string already, it places it into the dictionary.

If the dictionary already has the string, it checks to see if the new value is longer than the dictionary's value. If so, it updates the dictionary accordingly.

The main looping is done by just checking every value in Column A until it hits an empty cell.

Private Sub findLongestSet()

Dim setDict As Object
Set setDict = New Dictionary

Dim iterations As Integer
iterations = 0

Dim val As String

While Not IsEmpty(Range("A1").Offset(iterations, 0).Value)
    
    If Not setDict.Exists(Left(Range("A1").Offset(iterations, 0).Value, 6)) Then
        setDict.Add Left(Range("A1").Offset(iterations, 0).Value, 6), Range("A1").Offset(iterations, 0).Value
    End If
    
    If setDict.Exists(Left(Range("A1").Offset(iterations, 0).Value, 6)) Then
        If Len(setDict(Left(Range("A1").Offset(iterations, 0).Value, 6))) < Len(Range("A1").Offset(iterations, 0).Value) Then
            setDict(Left(Range("A1").Offset(iterations, 0).Value, 6)) = Range("A1").Offset(iterations, 0).Value
        End If
    End If
    
    iterations = iterations   1

Wend

Dim index As Integer
index = 0

For Each varKey In setDict.Keys
    Range("B1").Offset(index, 0).Value = setDict(varKey)
    index = index   1
Next

End Sub

Results: input and output

  • Related