I am trying to compare a set of data from Report 1 (in column A) with a set of data from Report 2 (in column B). The reports contain data such as name, DOB, SSN, address, etc. Both reports have instances when there will be multiple data points within a single cell, such as multiple possible dates of birth, separated by a pipe |. I am trying to find all instances of overlap where data from Report 1 is also on Report 2.
I've set up a VBA script to do all of the formatting, adding columns, dropping in a comparison formula, sorting, etc but the comparison formula I've been using fails if there are any differences at all (such as multiple DOB options in Report 1 but only one DOB in Report 2). I know that there has to be a decent way to do this (some kind of string match formula), but I'm striking out and the manual way is slow and goofy. Any help would be greatly appreciated!
Doing this manually, I can add a 'helper' column to find the number of pipes in each line of the report: =LEN(A2)-LEN(SUBSTITUTE(A2,"|","")). Find the max number of pipes in that helper column, add that many columns, and use the Text to Columns button to split the info from Report 1 into multiple columns with a single piece of data in each. Repeat for Report 2. Then I can use a Countif to compare any of the cells from Report 1 with those from Report 2 in yet another column. This works more or less, but it adds a boatload of columns and that will throw off my VBA script for everything further to the right for the other data points.
Is there a formula that can be used to compare the two columns of data without splitting the data via the delimiters? Something along the lines of 'find the same pattern of characters from cell B2 anywhere within cell A2'. This would be my absolute preference if possible.
A slight change to #1, is it possible to change that slightly to 'find 7 or more characters from the pattern of characters in cell B2 within cell A2'? This would be greatly helpful in comparing SSN's for when some have leading 0's and some don't! A similar tweak would be for 'find 5 or more numbers in a pattern from B2 in A2' (zip code). Not just the same characters, but the same order of characters.
If there really is no good way to compare the two sets of data without splitting the data into multiple columns, how do I add the right numbers of columns based on the max number of pipes and then have later steps referencing to columns that are now moved over?
Thank you for any ideas and help!!
Image below for reference:
DOBs from Report 1 | DOBs from Report 2 | Output |
---|---|---|
1/1/1984 | 5/7/1986 | No Match |
6/7/1976 | 6/7/1976 | Match |
9/12/1999 | 12/9/1999 | 5/9/1999 | No Match |
1/1/1987 | 1/2/1988 | 1/3/1989 | 1/2/1988 | 1/3/1989 | Match |
Edits (thank you all so very much for the help and comments, I really appreciate it!):
InStr to find col B in col A from Maya's help:
Sub InStr_Match() Dim LastRow As Long, i As Long
LastRow = Cells(Rows.Count, "A").End(xlUp).Row ' get last row with data in column A
For i = 2 To LastRow
If InStr(1, Range("A" & i), Range("B" & i), vbt) > 0 Then Range("C" & i).Value2 = "Match"
Else Range("C" & i).Value2 = "No Match"
End If
Next i
End Sub
This is adding an answer of Match or No Match in column C after trying to find B in A. I can add an OR and send it to find A in B, which will help with some of the misses.
@JvdV - I need to match only within each row please. A DOB match between the reports but on different rows is not relevant for this purpose. Each row is a 'person' and I'm trying to find if any of their data on Report 1 is a match to any of their data on Report 2. The formatting is a bit of a mix, but I think I can try to standardize that earlier in the macro. The dates tend to be either "mm/dd/yyyy" or "mm-dd-yyyy | mm-dd-yyyy" so I can replace the -'s with /'s to standardize that. I can also throw '=text(A2,"mm/dd/yyyy") at them to standardize them a bit better (I need to tweak that so that it's VBA rather than a formula). I'm on Excel 365 usually.
@Ike - thanks for fixing the chart! The output I'm looking for is match/no match, or true/false. I've added that to the example.
CodePudding user response:
You will have to split the Value in B into single data-points and compare each of them. If one match is enough (and you dont want the next miss-hit to erase it), it should something like:
Sub InStr_Match()
Dim LastRow As Long, i As Long
LastRow = Cells(Rows.Count, "A").End(xlUp).Row ' get last row with data in column A
For i = 2 To LastRow
Dim arr() As String
arr = Split(Range("B" & i), " | ")
Dim name As Variant
For Each name In arr
If InStr(1, Range("A" & i), name, vbt) > 0 Then
Range("C" & i).Value2 = "Match"
GoTo matchfound
Else
Range("C" & i).Value2 = "No Match"
End If
Next
matchfound:
Next i
End Sub
CodePudding user response:
If you have various of the Office 365 functions, you can do this with a single worksheet formula in one cell:
(You will need to replace Report1
and Report2
with the appropriate range references)
=LET(rpt_1,--TRIM(TEXTSPLIT(TEXTJOIN("|",TRUE,Report1),,"|")),
rpt_2,--TRIM(TEXTSPLIT(TEXTJOIN("|",TRUE,Report2),,"|")),
FILTER(rpt_1,ISNUMBER(XMATCH(rpt_1,rpt_2))))
CodePudding user response:
Short answer is that this would be much easier to do via a macro.
This solution is a little "over-engineered", but I wanted to demonstrate converting strings to arrays in VBA and doing quick comparisons.
It will compare the dates in the two columns, and paste the matches in another column:
Public Function InArray(theValue As Variant, theArray) As Boolean
'________________________________________________________________________________________________
' Purpose:
' -> to assess if a value is in an array
'________________________________________________________________________________________________
' Parameters:
' - theValue, any type
' -----> the value to search for in the array
' - theArray, Array
' -----> The array to search for the value in
'________________________________________________________________________________________________
' Returns:
' - Boolean
' -----> Boolean is True if value is in array, False if not
'________________________________________________________________________________________________
On Error GoTo Err
Dim iter As Long
For iter = LBound(theArray) To UBound(theArray)
If theArray(iter) = theValue Then
InArray = True
Exit Function
End If
Next iter
InArray = False
Exit Function
Err:
' on error, tell the user the error and stop the code for debugging
Call MsgBox("An error occurred in the Public Function InArray. The code will stop:" _
& vbCrLf & vbCrLf & Err.Description, vbCritical)
Stop
End Function
Sub CompareDates()
Dim reportWorksheet As Worksheet
Dim reportColumnOneInt As Integer
Dim reportColumnTwoInt As Integer
Dim matchColumnInt As Integer
Dim rowPosition As Long
Dim startingRow As Integer
Dim endRow As Integer
Dim columnOneArray As Variant
Dim columnTwoArray As Variant
Dim columnOneString As String
Dim columnTwoString As String
Dim matchesString As String
Dim arrayIter As Long
' Set objects
Set reportWorksheet = ActiveWorkbook.Sheets("Sheet1")
' Set variables
startingRow = 2
reportColumnOneInt = 1
reportColumnTwoInt = 2
matchColumnInt = 3
endRow = reportWorksheet.Range("A" & reportWorksheet.Rows.Count).End(xlUp).Row
' Iterate down the rows
For rowPosition = startingRow To endRow
columnOneString = Replace(CStr(reportWorksheet.Cells(rowPosition, reportColumnOneInt).Value), " ", "")
columnTwoString = Replace(CStr(reportWorksheet.Cells(rowPosition, reportColumnTwoInt).Value), " ", "")
' convert first column values to array
If InStr(1, columnOneString, "|") = 0 Then
columnOneArray = Array(columnOneString)
Else
columnOneArray = Split(columnOneString, "|")
End If
' convert second column values to array
If InStr(1, columnTwoString, "|") = 0 Then
columnTwoArray = Array(columnTwoString)
Else
columnTwoArray = Split(columnTwoString, "|")
End If
' Iterate over first column, checking to see if values in second column
matchesString = ""
For arrayIter = LBound(columnOneArray) To UBound(columnOneArray)
columnOneArray(arrayIter) = Trim(columnOneArray(arrayIter))
Debug.Print (columnOneArray(arrayIter))
'Check if values in first column match any in second
If InArray(columnOneArray(arrayIter), columnTwoArray) = True Then
If matchesString = "" Then
matchesString = columnOneArray(arrayIter)
Else
matchesString = matchesString & " | " & columnOneArray(arrayIter)
End If
End If
Next arrayIter
' set value of match column
reportWorksheet.Cells(rowPosition, matchColumnInt).Value = matchesString
Next rowPosition
End Sub