I have a spreadsheet with data that spans 30,000 rows and 28 columns. I'm looking to encode the data in a specific way. The data consist of a mix of strings and numbers. For each row (columns 1 through 28), I need to convert each character in each cell to a number. I have a dictionary in place to do the conversion. Where the character is the key and the value is the encoding.
My code works, however it's kind of slow. It takes 30 minutes to accomplish the task. Which is understandable given the amount of data we are looking at. 30, 000 rows x 28 columns x N number of characters. It's a lot.
Quick Description of code below:
- Loop through each cell in Range(30,000 rows, 28 columns)
- For each row, concatenate all values into a single string
- Pass large string into dictionary character by character, retrieving an encoded value (a number).
- Write encoded string onto sheet. Where each value has it's own cell.
I'm guessing the bottleneck is when I write the encoding to the sheet in a loop. I'm wondering if their is a faster way to do this?
Sub main()
Application.DisplayAlerts = False
Application.ScreenUpdating = False
'initialize dictionary for encoding data on another module
Globals.initialize_globals
'loop through each record
Dim wkbook As Workbook: Set wkbook = Workbooks.Application.ActiveWorkbook
Dim wksheet As Worksheet: Set wksheet = wkbook.Worksheets("Raw Data")
Dim lastRow As Integer: lastRow = wksheet.Cells(Rows.Count, 1).End(xlUp).Row
Dim stringbuilder As String
Dim encoding() As Integer
Dim char_count As Integer
Dim i, ii, e As Integer
'loop through rows, columns and encode string data
For i = 2 To lastRow
'loop through columns
For ii = 1 To 27
'concatenate each cell value as a large string
stringbuilder = stringbuilder & wksheet.Range(Cells(i, ii), Cells(i, ii)).Value
Next
encoding = EncodeString(stringbuilder)
stringbuilder = ""
For e = 1 To UBound(encoding)
'write encoding onto sheet
wksheet.Range(Cells(i, 33 e), Cells(i, 33 e)) = encoding(e)
Next
Next
Application.DisplayAlerts = True
Application.ScreenUpdating = True
End Sub
Function EncodeString(str As String) As Integer()
Dim encoded() As Integer
ReDim encoded(1 To Len(str))
For i = 1 To Len(str)
' build an encoded string by passing in each character as a key, and retrieve encoded value
encoded(i) = Parameters.Item(Mid(str, i, 1))
Next
EncodeString = encoded
End Function
CodePudding user response:
Please, try the next updated code:
Sub main()
'initialize dictionary for encoding data on another module
Globals.initialize_globals
Dim wkbook As Workbook: Set wkbook = Workbooks.Application.ActiveWorkbook
Dim wksheet As Worksheet: Set wksheet = wkbook.Worksheets("Raw Data")
Dim lastRow As Integer: lastRow = wksheet.cells(rows.count, 1).End(xlUp).row
Dim stringbuilder As String, encoding() As Integer, char_count As Integer
Dim i As Long, ii As Long, e As Long
Dim arr, arrFin, N As Long, maxCol As Long 'new variables
N = 1000 'the estimated maximum number of characters on a row
arr = wksheet.Range("A2:AA" & lastRow).Value
ReDim arrFin(1 To UBound(arr), 1 To N)
'loop through array rows, columns and encode string data
For i = 2 To UBound(arr)
For ii = 1 To UBound(arr, 2)
'concatenate each cell value as a large string
stringbuilder = stringbuilder & arr(i, ii)
Next ii
encoding = EncodeString(stringbuilder)
stringbuilder = ""
If UBound(encoding) > UBound(arrFin, 2) Then
maxCol = UBound(encoding)
ReDim Preserve arrFin(1 To UBound(arr), 1 To maxCol)
End If
For e = 1 To UBound(encoding)
arrFin(i, e) = encoding(e)
Next e
Next i
if maxCol = 0 Then maxCol =ubound(arrFin, 2)
'drop the array content at once:
wksheet.Range("AG2").Resize(UBound(arrFin), maxCol).Value = arrFin
End Sub
Please, use your existing function to encode...
Not tested, of course, but this should be the idea to make the code much faster. In fact, I tried putting in code the suggestion I made in my comment.
CodePudding user response:
The bottleneck in VBA for Excel is accessing Excel from VBA, so you can dramatically reduce the runtime by reading data from a range into a 2-dimensional array and do your stuff with the data in the array.
Similarly, you can collect output data into an array and write it back at once. In your special case, it's probably easier to write the data row by row (as for every row, you might have a different length of data), but at least you can dump the data for one row in one go.
String handling within VBA is rather fast, you don't have to worry about that.
This should be fairly fast:
data = wksheet.Range(wksheet.Cells(2, 1), wksheet.Cells(lastRow, 27))
Dim row As Long, col As Long
For row = LBound(data, 1) To UBound(data, 1)
Dim stringbuilder As String
stringbuilder = ""
For col = LBound(data, 2) To UBound(data, 2)
'concatenate each cell value as a large string
stringbuilder = stringbuilder & data(row, col)
Next
If stringbuilder <> "" Then
Dim encoding() As Long
encoding = EncodeString(stringbuilder)
End If
wksheet.Cells(row 1, 33).Resize(1, UBound(encoding)) = encoding
Next
Note that you always should use Long
rather than Integer
in VBA. It avoids overflows and is faster anyhow (and it doesn't uses more memory)
Update
Made a benchmark test (regular computer, Excel 2016). 30k row * 26 cols with mixed (random) data (using a dummy encoding function that converted every character as its ascii value). It took approx. 7s to execute.
I also made a variation using a large 2-dimensional output array (like the attempt that FaneDuru did in his answer), but this took longer (13s).