i have this function to identify credit card by regex in input string and mask it without the last 4 digits:
public CharSequence obfuscate(CharSequence data) {
String[] result = data.toString().replaceAll("[^a-zA-Z0-9-_*]", " ").trim().replaceAll(" ", " ").split(" ");
for(String str : result){
String originalString = str;
String cleanString = str.replaceAll("[-_]","");
CardType cardType = CardType.detect(cleanString);
if(!CardType.UNKNOWN.equals(cardType)){
String maskedReplacement = maskWithoutLast4Digits(cleanString ,replacement);
data = data.toString().replace(originalString , maskedReplacement);
}
}
return data;
}
static String maskWithoutLast4Digits(String input , String replacement) {
if(input.length() < 4){
return input;
}
return input.replaceAll(".(?=.{4})", replacement);
}
//pattern enum
public enum CardType {
UNKNOWN,
VISA("^4[0-9]{12}(?:[0-9]{3}){0,2}$"),
MASTERCARD("^(?:5[1-5]|2(?!2([01]|20)|7(2[1-9]|3))[2-7])\\d{14}$"),
AMERICAN_EXPRESS("^3[47][0-9]{13}$"),
DINERS_CLUB("^3(?:0[0-5]|[68][0-9])[0-9]{11}$"),
DISCOVER("^6(?:011|[45][0-9]{2})[0-9]{12}$");
private Pattern pattern;
CardType() {
this.pattern = null;
}
CardType(String pattern) {
this.pattern = Pattern.compile(pattern);
}
public static CardType detect(String cardNumber) {
for (CardType cardType : CardType.values()) {
if (null == cardType.pattern) continue;
if (cardType.pattern.matcher(cardNumber).matches()) return cardType;
}
return UNKNOWN;
}
public Pattern getPattern() {
return pattern;
}
}
input1: "Valid American Express card: 371449635398431".
output1: "Valid American Express card: ***********8431"
input2: "Invalid credit card: 1234222222222" //not mach any credit card pattern
output2: "Invalid credit card: 1234222222222"
input3: "Valid American Express card with garbage characters: <3714-4963-5398-431>"
output: "Valid American Express card with garbage characters: <***********8431>"
this is not the best way to to do the masking since this method will be called for each tag in huge html and each line in huge text files how i can improve the performance of this method
CodePudding user response:
Wouldn't it be nice if all validations were done before the card numbers went into the database (or data files).
I'm not convinced that using RegEx for any part of your code is necessarily the best course to take if what you want is speed since processing regular expressions can consume a lot of time. As an example, take the line that does the string masking in the maskWithoutLast4Digits()
method:
static String maskWithoutLast4Digits(String input, String replacement) {
if(input.length() <= 4){
return input; // There is nothing to mask!
}
return input.replaceAll(".(?=.{4})", replacement);
}
and replace it with this code:
static String maskWithoutLast4Digits(String input, String replacement) {
if (input.length() <= 4) {
return input; // There is nothing to mask!
}
char[] chars = input.toCharArray();
Arrays.fill(chars, 0, chars.length - 4, replacement);
return new String(chars);
}
You would probably find that the overall code will carry out the task on a single credit card number string almost twice as fast than the method with the regex. That's a considerable difference. As a matter of fact, if you run the code through a profiler you may very well find that the method with the regex in it could get progressively slower for each string processed whereas the second method will keep things flowing on a more constant speed.
Different credit cards basically start with a specific single numerical numerical value with the exception of a few cards, for example, if a credit card number begins with 3, then it's always part of the American Express, Diner's Club or Carte Blanche payment networks. If the card begins with a 4, then it is a Visa. Card numbers that begin with 5 are part of the MasterCards, while cards that begin with 6 belong to the Discover network.
Card Starts With No. of Digits
==================================================================
American Express can be 34 or usually 37 15
JBC 35 16
Diners Club usually 36 or can be 38 14
VISA 4 16
Mastercard 5 16
Discovery 6 16
You don't need regex to determine if a credit card number starts with any of these values and it should be noted that some cards don't necessarily always contain the same number if digits. It may depend upon the issuer as I'm sure you already know but never the less, credit cards that are part of the Visa, Mastercard and Discover payment networks have 16 digits, while those that are part of the American Express payment network have just 15. While it's most common for credit cards to have 16 digits, they can possibly have as few as 13 and as many as 19. I haven't scoured your RegEx's but I'm sure they have that covered (right?).
To remove the use of Regex you could use a switch/case
mechanism instead, for example:
// Demo card number...
String cardNumber = "371449635398431";
/* Remove all Characters other than digits.
Don't want them for validation. */
cardNumber = cardNumber.replaceAll("\\D", ""); // Remove all Characters other than digits
String cardName; // Used to store the card's name
switch (cardNumber.substring(0, 1)) {
case "3":
String typeNum = cardNumber.substring(0, 2);
switch(typeNum) {
case "34": case "37":
cardName = "American-Express";
break;
case "35":
cardName = "JBC";
break;
case "30": case "36": case "38": case "39":
cardName = "Diners-Club";
break;
default:
cardName = "UNKNOWN";
}
break;
case "4":
cardName = "Visa";
break;
case "5":
cardName= "Mastercard";
break;
case "6":
cardName = "Discovery";
break;
default:
cardName = "UNKNOWN";
}
If you were to run speed tests on this code in comparison to iterating through a bunch of RegEx's, I believe you will find a considerable speed improvement even if you wanted to also check the length of each card number processed within each case
.
The best way to validate a credit card number is to use the Luhn Formula (also known as the Luhn Algorithm) which basically follows this scheme:
- Begin by doubling the value of every odd digit of the card number you are verifying. If the resulting sum of any given doubling operation is greater than 9 (for example, 7 x 2 = 14 or 9 x 2 = 18), then add the digits of that sum (e.g., 14: 1 4 = 5 or 18: 1 8 = 9).
- Now add up all the resulting digits, including the even digits, which you did not multiply by two.
- If the total you received ends in 0, the card number is valid according to the Luhn Algorithm; otherwise it is not valid.
The whole process of course can be placed into a method for ease if use, for example:
/**
* Returns true if card (ie: MasterCard, Visa, etc) number is valid using
* the 'Luhn Algorithm'.
*
* @param cardNumber (String)
*
* @return (Boolean)
*/
public static boolean isValidCardNumber(String cardNumber) {
if (cardNumber == null || cardNumber.trim().isEmpty()) {
return false;
}
cardNumber = cardNumber.replaceAll("\\D", "");
// Luhn algorithm
int nDigits = cardNumber.length();
int nSum = 0;
boolean isSecond = false;
for (int i = nDigits - 1; i >= 0; i--) {
int d = cardNumber.charAt(i) - '0';
if (isSecond == true) {
d = d * 2;
}
// We add two digits to handle
// cases that make two digits
// after doubling
nSum = d / 10;
nSum = d % 10;
isSecond = !isSecond;
}
return (nSum % 10 == 0);
}
To put this all together your code might look something similar to this:
public static String validateCreditCardNumber(String cardNumber) {
// Remove all Characters other than digits
cardNumber = cardNumber.replaceAll("\\D", ""); // Remove all Characters other than digits
String cardName; // Used to store the card's name
switch (cardNumber.substring(0, 1)) {
case "3":
String typeNum = cardNumber.substring(0, 2);
switch(typeNum) {
case "34": case "37":
cardName = "American-Express";
break;
case "35":
cardName = "JBC";
break;
case "30": case "36": case "38": case "39":
cardName = "Diners-Club";
break;
default:
cardName = "UNKNOWN";
}
break;
case "4":
cardName = "Visa";
break;
case "5":
cardName= "Mastercard";
break;
case "6":
cardName = "Discovery";
break;
default:
cardName = "UNKNOWN";
}
if (!cardName.equals("UNKNOWN") && isValidCardNumber(cardNumber)) {
return ("The " cardName " card number (" maskWithoutLast4Digits(cardNumber, '*') ") is VALID!");
}
else {
return ("The " cardName " card number (" maskWithoutLast4Digits(cardNumber, '*') ") is NOT VALID!");
}
}
public static String maskWithoutLast4Digits (String input, char replacement) {
if (input.length() <= 4) {
return input; // Nothing to mask
}
char[] buf = input.toCharArray();
Arrays.fill(buf, 0, buf.length - 4, replacement);
return new String(buf);
}
/**
* Returns true if card (ie: MasterCard, Visa, etc) number is valid using
* the 'Luhn Algorithm'.
*
* @param cardNumber (String)
*
* @return (Boolean)
*/
public static boolean isValidCardNumber(String cardNumber) {
if (cardNumber == null || cardNumber.trim().isEmpty()) {
return false;
}
cardNumber = cardNumber.replaceAll("\\D", "");
// Luhn algorithm
int nDigits = cardNumber.length();
int nSum = 0;
boolean isSecond = false;
for (int i = nDigits - 1; i >= 0; i--) {
int d = cardNumber.charAt(i) - '0';
if (isSecond == true) {
d = d * 2;
}
// We add two digits to handle
// cases that make two digits
// after doubling
nSum = d / 10;
nSum = d % 10;
isSecond = !isSecond;
}
return (nSum % 10 == 0);
}
And to basically use the above:
// Demo card number...
String cardNumber = "371449635398431";
String isItValid = validateCreditCardNumber(cardNumber);
System.out.println(isItValid);
Out put to console would be:
The American-Express card number (***********8431) is VALID!
I'm not exactly sure where your output is going but it may be best to file it somewhere before displaying it since you will always be speed limited to that process. Also, Breaking the data into manageable chunks and using multiple executor-Service threads to process the data would greatly increase speed as can using one of the newer JDK's (above Java8) and utilizing some of the newer methods.