Home > Net >  Remove duplicate string from cell but keep last instance of duplicate
Remove duplicate string from cell but keep last instance of duplicate

Time:02-19

using VBA on excel to remove duplicates strings (whole words) from a cell, but keep the last instance of the duplicate.

Example hello hi world hello => hi world hello this is hello my hello world => this is my hello world

Iam originally a python developer so excuse my lack of syntax in VBA, I have edited a piece of code found online with the following logic:

''' Function RemoveDupeWordsEnd(text As String, Optional delimiter As String = " ") As String Dim dictionary As Object Dim x, part, endword

Set dictionary = CreateObject("Scripting.Dictionary")
dictionary.CompareMode = vbTextCompare
For Each x In Split(text, delimiter)
    part = Trim(x)
    
    If part <> "" And Not dictionary.exists(part) Then
        dictionary.Add part, Nothing
    End If

    '' COMMENT
    '' if the word exists in dictionary remove previous instance and add the latest instance
    If part <> "" And dictionary.exists(part) Then
            dictionary.Del part, Nothing
            endword = part
            dictionary.Add endword, Nothing
    End If
    
   
Next

If dictionary.Count > 0 Then
    RemoveDupeWordsEnd = Join(dictionary.keys, delimiter)
Else
    RemoveDupeWordsEnd = ""
End If

Set dictionary = Nothing

End Function

'''

Thanks all help and guidance would be very much appreciated

CodePudding user response:

Keep the Last Occurrence of Matching Substrings

Option Explicit

Function RemoveDupeWordsEnd( _
    ByVal DupeString As String, _
    Optional ByVal Delimiter As String = " ") _
As String
    
    Dim dict As Object: Set dict = CreateObject("Scripting.Dictionary")
    dict.CompareMode = vbTextCompare
    
    Dim Item As Variant
    Dim Word As String
    
    For Each Item In Split(DupeString, Delimiter)
        Word = Trim(Item)
        If Len(Word) > 0 Then
            If dict.Exists(Word) Then
                dict.Remove Word
            End If
            dict(Word) = Empty ' the same as 'dict.Add Word, Empty'
        End If
    Next Item

    If dict.Count > 0 Then RemoveDupeWordsEnd = Join(dict.Keys, Delimiter)

End Function

CodePudding user response:

Use VBA's replace in a while loop that terminates when the occurrences of the string drop below 2. Replace takes an optional argument for the number of matches to replace.

Function keepLast(raw As String, r As String) As String
  
  While (Len(raw) - Len(Replace(raw, r, ""))) / Len(r) > 1
    raw = Replace(raw, r, "", , 1)
    Wend
    
  keepLast = Trim(Replace(raw, "  ", " "))
    
End Function

I use Trim and Replace any double spaces with a single space to avoid extraneous white space that is left by the removal of the target string. You could avoid the loop by just counting the number of occurrences and passing that minus 1 straight to replace:

Function keepLast(raw As String, r As String) As String
  
  keepLast = raw
  
  Dim cnt As Integer
  cnt = (Len(raw) - Len(Replace(raw, r, ""))) / Len(r)
  If cnt < 2 Then Exit Function
  
  raw = Replace(raw, r, "", , cnt - 1)
  keepLast = Trim(Replace(raw, "  ", " "))

End Function

Bear in mind that this method is very susceptible to partial matches. If your raw string was "hello that Othello is a good play hello there", then you'll end up with "that O is a good play hello there", which I don't think is exactly what you want. You might use regex to address this, if it's necessary:

Function keepLast(raw As String, r As String) As String
  
  Dim parser As Object
  Set parser = CreateObject("vbscript.regexp")
  parser.Global = True
  parser.Pattern = "\b" & r & "\b"
  
  While parser.Execute(raw).Count > 1
    raw = parser.Replace(raw, "")
    Wend
  
  keepLast = Trim(Replace(raw, "  ", " "))

End Function

The regexp object takes an argument to ignore case, if you need to handle "hello" and "Hello". You would add that like this:

parser.ignoreCase = true
  • Related