I have some data that looks like this (more than 400 columns) :
year | ID | fake_num1 | fake_num2 | text1 |
---|---|---|---|---|
2019 | 11 | 36 000 | 10'000 | text, 1 |
2020 | 12 | -1 275 | 1 000,00 | text 2 |
Columns fake_num1 and fake_num2 are stored as text. What I'm trying to achieve is
- Identify those fake numbers columns
- Clean the data (e.g. remove space, columns, replace comma by points) with a for loop
I need some help with step 1. I have to identify columns fake_num1 and fake_num2, while avoiding columns like text1. I was thinking of going with regexp but maybe there is another solution.
I used part of the code here: SO regexp, however I am not sure how to proceed from there.
Dim strPattern as String: strPattern = "^[0-9]$"
will find anything that starts and ends with a number, and only has numbers (if my comprehension is correct). What's the best way to manage the cases listed in the table above ?
CodePudding user response:
Please, try the next code, It considers "fake numbers columns" as ones where replacing the necessary characters makes from string a number:
Sub testMakeNumbers()
Dim sh As Worksheet, lastR As Long, lastCol As Long, i As Long, rngCol As Range
Set sh = ActiveSheet 'you can use here the necessary sheet
lastR = sh.Range("A" & sh.rows.Count).End(xlUp).row
lastCol = sh.cells(1, Columns.Count).End(xlToLeft).Column
'determine the problematic columns:
For i = 1 To lastCol
If Not IsNumeric(sh.cells(2, i).Value) And _
IsNumeric(Replace(Replace(Replace(sh.cells(2, i).Value, " ", ""), "'", ""), ",", ".")) Then
If rngCol Is Nothing Then
Set rngCol = sh.cells(2, i)
Else
Set rngCol = Union(rngCol, sh.cells(2, i))
End If
End If
Next
'replace the characters making the string as number:
With Intersect(rngCol.EntireColumn, sh.Range("A2", sh.cells(lastR, lastCol)))
.Replace ",", "."
.Replace Chr(160), ""
.Replace " ", ""
.Replace "'", ""
End With
End Sub