I am currently working on an Excel spreadsheet where in column B we have the pserial values of each person.
The question is: how many pserial values in the dataset are incorrect?
A pserial is an 8-digit value combining the 6-digit hserial value and the persnum value (Person number within household), as a 2-digit value. for example, for household 1 the pserial is: 10105101 where the last two numbers are the persum and the first 6 are the hserial. However, a concern has been raised that there may be some bugs in the data, such that some of these pserial values do not follow this rule and hence are incorrect.
I can not alter or change any of the data, even if you identify an error. This is the spreadsheet, as you can see the data starts in line 11, and there is more data up to row 9291
I am not sure of what code to use but I can use either a macro or a formula here. Feel free to use either of those.
CodePudding user response:
Here you can check each row if pserial column values are well formed. In cell D2
you can enter the following formula:
=IF(C2:C7=1*(A2:A7 & TEXT(B2:B7,"00")), TRUE, FALSE)
Note: You can use Conditional Formatting to highlight the errors as follows for the range D2:D7
:
=C2:C7=1*(A2:A7 & TEXT(B2:B7,"00")) = FALSE
It assumes the persnum column values in pserial column is formatted as two digits and the values from pserial column are numbers (not text), that is why we multiply by 1
to do the conversation before the comparison. It uses the logic you mentioned in the question for building the pserial number from hserial and persnum columns.
If you want just to calculate the total number of errors, reusing the first formula as follows:
=LET(check, IF(C2:C7=1*(A2:A7 & TEXT(B2:B7,"00")), TRUE, FALSE),
ROWS(FILTER(check, check=FALSE,0)))
CodePudding user response:
This formula checks whether the pserial is the same as hersial & persnum as 2digit value, and counts all which are wrong.
=SUM(($C$11:$C$20<>NUMBERVALUE($A$11:$A$20&IF(LEN($B$11:$B$20)=1;0&$B$11:$B$20;$B$11:$B$20)))*1)