Home > Back-end >  How do I code a macro in VBA that deletes columns in excel that don't appear in an array?
How do I code a macro in VBA that deletes columns in excel that don't appear in an array?

Time:08-24

I'm creating a macro that is formatting a collection of files and a step in this process is to delete columns that aren't required, keeping a specific set of columns.

I know I can delete columns based on their location and I have this approach implemented already ie 1,3,7 etc or A, C, G etc. But I'm conscious that the report being used might change layout or add extra columns over time and I want to ensure the required columns are kept.

Ideally this code would cycle through each column header starting at A1 until the last column and delete an entire column if the header value isn't found in a list. This list will be an array captured from a range in one of the sheets in the workbook.

List = {Blue, Green, Orange}

Original Table

Blue Red Green Orange Black
row row row row row

Formatted Table

Blue Green Orange
row row row

Does anyone have any suggestions on the approach I could take to get this working or if it's even possible? Any help would be greatly appreciated

CodePudding user response:

You might profit from the following approach by reordering a datafield array via Application.Index which allows even to move the existing columns to any new position.

Note: this flexible solution can be time consuming for greater data sets, where I would prefer other ways you can find in a lot of answers at SO.

Sub ReorderColumns()
Const headerList As String = "Blue,green,Orange"
'a) define source range
    Dim src As Range
    Set src = Tabelle3.Range("A1:E100")
'b) define headers
    Dim allHeaders: allHeaders = src.Resize(1).Value2
    Dim newHeaders: newHeaders = Split(headerList, ",")
'c) get column positions in old headers
    Dim cols
    cols = getCols(newHeaders, allHeaders)
'd) define data
    Dim data As Variant
    data = src.Value2
'e) reorder data based on found column positions
    data = Application.Index(data, Evaluate("row(1:" & UBound(data) & ")"), cols)
'f) overwrite source data
    src = vbNullString      ' clear
    src.Resize(UBound(data), UBound(data, 2)) = data
End Sub

Help function getCols()

Function getCols(individualHeaders, allHeaders)
'Purp: get 1-based column numbers of found headers via Match
    getCols = Application.Match(individualHeaders, allHeaders, 0)  ' 1-based
End Function

CodePudding user response:

Dynamic Named Range

I think a dynamic named range is an excellent choice for storing and retrieving your required columns. Please see the link I provided from https://exceljet.net/ to setup your dynamic named range.

Generic formula =$A$2:INDEX($A:$A,COUNTA($A:$A))

Regular Expression Approach

After reading in your named range, one approach for testing your columns is using regular expressions. To use this you will need to set a library reference to Microsoft VBScript Regular Expressions 5.5. The pipe character | represents an or statement, so we can join our array using that delimiter.

Deleting Ranges in loops

When deleting columns or rows within a loop, the best approach I have found is to union the ranges together in a variable and execute the deletion in one go. This helps performance and it prevents errors from deleting ranges the loop is working on.

I do this so often that I created a custom function for this UnionRange

' Helper function that allows
' concatinating ranges together
Public Function UnionRange( _
    ByRef accumulator As Range, _
    ByRef nextRange As Range _
)
    If accumulator Is Nothing Then
        Set UnionRange = nextRange
    Else
        Set UnionRange = Union(accumulator, nextRange)
    End If
End Function

Putting it all together

Below is my implementation of what your code could look like, just make sure to first:

  • Create a Dynamic Named Range and populate with your required headers
  • Add Microsoft VBScript Regular Expressions 5.5 reference
  • Update Sheet1 to whatever sheet your table exists (possibly change logic for finding header row based on your needs)
' Need Regular Expressions Referenced in order to work!
' @libraryReference {Microsoft VBScript Regular Expressions 5.5}
Public Sub DemoDeletingNonRequiredColumns()
    ' Make sure to create a named range
    ' otherwise this section will fail. In this
    ' example the named range is `RequiredColumns`
    Dim requiredColumns() As Variant
    requiredColumns = Application.WorksheetFunction.Transpose( _
        Range("RequiredColumns").Value2 _
    )
    
    ' To test if the column is in the required
    ' columns this method uses regular expressions.
    With New RegExp
        .IgnoreCase = True
        ' The pipe charactor is `or` in testing.
        .Pattern = Join(requiredColumns, "|")
        
        Dim headerRow As Range
        ' This example uses `Sheet1`, but update to
        ' the actual sheet you are using.
        With Sheet1
            Set headerRow = .Range("A1", .Cells(1, Columns.Count).End(xlToLeft))
        End With
        
        Dim column As Range
        For Each column In headerRow
            
            ' If the column name doesn't match the
            ' pattern, then concatinate it to the
            ' toDelete range.
            If Not .Test(column.Value2) Then
                Dim toDelete As Range
                Set toDelete = UnionRange(toDelete, column.EntireColumn)
            End If
        Next
    End With
    
    ' toDelete is used as it provides better performance
    ' and it also prevents errors when deleting columns
    ' while looping.
    If Not toDelete Is Nothing Then
        toDelete.Delete
    End If
End Sub
  • Related