Home > Back-end >  VBA/Excel - Count unique words in columns with multiple words in each cell
VBA/Excel - Count unique words in columns with multiple words in each cell

Time:10-27

I am working with the below dataset. For the each company I would like to understand how many different product they order.

For example: company 'AAA' order 6 different products (Product 1,2,3,4,5,7).

Example

Not sure, if we need to split words in each column and later count one by one in the loop or is there any faster method? I have to use VBA here, and my dataset is more than 100k.

CodePudding user response:

You could maybe piece something together using, assuming data in A1:C?:

Sub Test()

Dim arr As Variant
Dim lr As Long, x As Long, y As Long
Dim dict1 As Object: Set dict1 = CreateObject("Scripting.Dictionary")
Dim dict2 As Object: Set dict2 = CreateObject("Scripting.Dictionary")

'Get initial array (NOTE: implicit reference to the active worksheet)
lr = Cells(Rows.Count, "A").End(xlUp).Row
arr = Range("A2:C" & lr)

'Loop through array and fill dictionary
For x = LBound(arr) To UBound(arr)
    dict1(arr(x, 1)) = dict1(arr(x, 1)) & "," & arr(x, 3)
Next

'Loop through dictionary and count unique items
For y = 0 To dict1.Count - 1
    For Each el In Split(dict1.Items()(y), ",")
        dict2(el) = 1
    Next
    dict1(dict1.keys()(y)) = dict2.Count - 1
    dict2.RemoveAll
    
    'Check the result
    Debug.Print dict1.keys()(y) & "-" & dict1.Items()(y)
Next

End sub

CodePudding user response:

This answer might seem very silly, but as you are separating the different products with a comma, why not simply count the amount of commas and add 1, something like:

=SEARCH(",",C2,1) 1

Once you have this in a helper column, you can use Excel's basic Subtotals feature for finding the sum per customer.

CodePudding user response:

Please, test the next code. It will return (in the above code in the next sheet, but it can return in any sheet) the unique client, followed by total products count and in the next columns the ordered products:

Sub ProductsPerClient()
    Dim sh As Worksheet, sh1 As Worksheet, lastR As Long, arr, arrSpl, arrFin, colMax As Long
    Dim i As Long, j As Long, dict As Object
    
    Set sh = ActiveSheet
    Set sh1 = sh.Next 'use here the sheet you need
    lastR = sh.Range("A" & sh.rows.count).End(xlUp).row
    arr = sh.Range("A2:C" & lastR).value
    Set dict = CreateObject("Scripting.Dictionary")
    For i = 1 To UBound(arr)
        arrSpl = Split(Trim(arr(i, 3)), ",")
        If Not dict.Exists(arr(i, 1)) Then
            dict.Add arr(i, 1), Join(arrSpl, "|")
            If UBound(arrSpl)   1 > colMax Then colMax = UBound(arrSpl)   1
        Else
            dict(arr(i, 1)) = dict(arr(i, 1)) & "|" & Join(arrSpl, "|")
            If UBound(Split(dict(arr(i, 1)), "|"))   1 > colMax Then colMax = UBound(Split(dict(arr(i, 1)), "|"))   1
        End If
    Next i
    ReDim arrFin(1 To dict.count, 1 To colMax   2)

    For i = 0 To dict.count - 1
        arrFin(i   1, 1) = dict.Keys()(i)
        arrSpl = Split(dict.items()(i), "|")
        arrFin(i   1, 2) = UBound(arrSpl)   1
        For j = 0 To UBound(arrSpl)
            arrFin(i   1, j   3) = arrSpl(j)
        Next j
    Next i
    'drop the final array content:
    sh1.Range("A2").Resize(UBound(arrFin), UBound(arrFin, 2)).value = arrFin
 End Sub
  • Related