using VBA on excel to remove duplicates strings (whole words) from a cell, but keep the last instance of the duplicate.
Example hello hi world hello => hi world hello this is hello my hello world => this is my hello world
Iam originally a python developer so excuse my lack of syntax in VBA, I have edited a piece of code found online with the following logic:
''' Function RemoveDupeWordsEnd(text As String, Optional delimiter As String = " ") As String Dim dictionary As Object Dim x, part, endword
Set dictionary = CreateObject("Scripting.Dictionary")
dictionary.CompareMode = vbTextCompare
For Each x In Split(text, delimiter)
part = Trim(x)
If part <> "" And Not dictionary.exists(part) Then
dictionary.Add part, Nothing
End If
'' COMMENT
'' if the word exists in dictionary remove previous instance and add the latest instance
If part <> "" And dictionary.exists(part) Then
dictionary.Del part, Nothing
endword = part
dictionary.Add endword, Nothing
End If
Next
If dictionary.Count > 0 Then
RemoveDupeWordsEnd = Join(dictionary.keys, delimiter)
Else
RemoveDupeWordsEnd = ""
End If
Set dictionary = Nothing
End Function
'''
Thanks all help and guidance would be very much appreciated
CodePudding user response:
Keep the Last Occurrence of Matching Substrings
Option Explicit
Function RemoveDupeWordsEnd( _
ByVal DupeString As String, _
Optional ByVal Delimiter As String = " ") _
As String
Dim dict As Object: Set dict = CreateObject("Scripting.Dictionary")
dict.CompareMode = vbTextCompare
Dim Item As Variant
Dim Word As String
For Each Item In Split(DupeString, Delimiter)
Word = Trim(Item)
If Len(Word) > 0 Then
If dict.Exists(Word) Then
dict.Remove Word
End If
dict(Word) = Empty ' the same as 'dict.Add Word, Empty'
End If
Next Item
If dict.Count > 0 Then RemoveDupeWordsEnd = Join(dict.Keys, Delimiter)
End Function
CodePudding user response:
Use VBA's replace in a while loop that terminates when the occurrences of the string drop below 2. Replace takes an optional argument for the number of matches to replace.
Function keepLast(raw As String, r As String) As String
While (Len(raw) - Len(Replace(raw, r, ""))) / Len(r) > 1
raw = Replace(raw, r, "", , 1)
Wend
keepLast = Trim(Replace(raw, " ", " "))
End Function
I use Trim and Replace any double spaces with a single space to avoid extraneous white space that is left by the removal of the target string. You could avoid the loop by just counting the number of occurrences and passing that minus 1 straight to replace:
Function keepLast(raw As String, r As String) As String
keepLast = raw
Dim cnt As Integer
cnt = (Len(raw) - Len(Replace(raw, r, ""))) / Len(r)
If cnt < 2 Then Exit Function
raw = Replace(raw, r, "", , cnt - 1)
keepLast = Trim(Replace(raw, " ", " "))
End Function
Bear in mind that this method is very susceptible to partial matches. If your raw string was "hello that Othello is a good play hello there", then you'll end up with "that O is a good play hello there", which I don't think is exactly what you want. You might use regex to address this, if it's necessary:
Function keepLast(raw As String, r As String) As String
Dim parser As Object
Set parser = CreateObject("vbscript.regexp")
parser.Global = True
parser.Pattern = "\b" & r & "\b"
While parser.Execute(raw).Count > 1
raw = parser.Replace(raw, "")
Wend
keepLast = Trim(Replace(raw, " ", " "))
End Function
The regexp object takes an argument to ignore case, if you need to handle "hello" and "Hello". You would add that like this:
parser.ignoreCase = true