Home > Back-end >  java.util.regex.Pattern matcher causes high CPU usage
java.util.regex.Pattern matcher causes high CPU usage

Time:11-22

We are having issue with a regex validation using Pattern. It is happening in Spring framework and hibernate's validation.

Below snippet shows the request object being validated:

@PostMapping
public ResponseEntity create(@RequestBody RequestObj request) {
  validationService.validate(request);
  .....

}

Regex pattern:

public class RequestObj {

  @Pattern(regexp = "^([a-zA-Z]) [-.'\\s]?[-a-zA-Z]*$", message = ValidationConstant.ERR_INVALID_FIRST_NAME)
  @NotNull(message = ValidationConstant.ERR_FIRST_NAME_EMPTY)
  @Size(max = 30, message = ValidationConstant.ERR_INVALID_NAME_SIZE)
  private String firstName;

}

When this validation is called, at times the CPU usage of the thread shows 100%.(It works most of the time). The thread dump shows that thread is stuck in Pattern class.

"http-nio-8080-exec-4" #53 daemon prio=5 os_prio=0 tid=0x00007fce45f0d000 nid=0x44 runnable [0x00007fcdb3af6000]
   java.lang.Thread.State: RUNNABLE
        at java.util.regex.Pattern$5.isSatisfiedBy(Pattern.java:5265)
        at java.util.regex.Pattern$5.isSatisfiedBy(Pattern.java:5265)
        at java.util.regex.Pattern$5.isSatisfiedBy(Pattern.java:5265)
        at java.util.regex.Pattern$CharProperty.match(Pattern.java:3790)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4274)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4818)
        at java.util.regex.Pattern$Prolog.match(Pattern.java:4755)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$Begin.match(Pattern.java:3539)
        at java.util.regex.Matcher.match(Matcher.java:1270)
        at java.util.regex.Matcher.matches(Matcher.java:604)
        at org.hibernate.validator.internal.constraintvalidators.bv.PatternValidator.isValid(PatternValidator.java:60)
        at org.hibernate.validator.internal.constraintvalidators.bv.PatternValidator.isValid(PatternValidator.java:24)
        at org.hibernate.validator.internal.engine.constraintvalidation.ConstraintTree.validateSingleConstraint(ConstraintTree.java:171)
        at org.hibernate.validator.internal.engine.constraintvalidation.SimpleConstraintTree.validateConstraints(SimpleConstraintTree.java:68)
        at org.hibernate.validator.internal.engine.constraintvalidation.ConstraintTree.validateConstraints(ConstraintTree.java:73)
        at org.hibernate.validator.internal.metadata.core.MetaConstraint.doValidateConstraint(MetaConstraint.java:127)
        at org.hibernate.validator.internal.metadata.core.MetaConstraint.validateConstraint(MetaConstraint.java:120)
        at org.hibernate.validator.internal.engine.ValidatorImpl.validateMetaConstraint(ValidatorImpl.java:533)
        at org.hibernate.validator.internal.engine.ValidatorImpl.validateConstraintsForSingleDefaultGroupElement(ValidatorImpl.java:496)
        at org.hibernate.validator.internal.engine.ValidatorImpl.validateConstraintsForDefaultGroup(ValidatorImpl.java:465)
        at org.hibernate.validator.internal.engine.ValidatorImpl.validateConstraintsForCurrentGroup(ValidatorImpl.java:430)
        at org.hibernate.validator.internal.engine.ValidatorImpl.validateInContext(ValidatorImpl.java:380)
        at org.hibernate.validator.internal.engine.ValidatorImpl.validateCascadedAnnotatedObjectForCurrentGroup(ValidatorImpl.java:605)
        at org.hibernate.validator.internal.engine.ValidatorImpl.validateCascadedConstraints(ValidatorImpl.java:568)
        at org.hibernate.validator.internal.engine.ValidatorImpl.validateInContext(ValidatorImpl.java:389)
        at org.hibernate.validator.internal.engine.ValidatorImpl.validate(ValidatorImpl.java:169)

Is there any issue in my regex?

The regex is for first name validation which supports alphabets and few special chars ^([a-zA-Z]) [-.'\\s]?[-a-zA-Z]*$

CodePudding user response:

Do you actually use the first group? ([a-zA-Z])

I don't think so, because otherwise you would have found out that it does not get filled with the letters up to the first non-letter character.

You probably want to put the sign into the group:

^([a-zA-Z] )[-.'\\s]?[-a-zA-Z]*$

or do not use a group at all, if you don't need that part as group (I think that it is probably not used in your annotation):

^[a-zA-Z] [-.'\\s]?[-a-zA-Z]*$

CodePudding user response:

The effect of repetitions in the form of backtracking makes regex unexpectedly costly.

In this case an optional interpunction between letters may take long, as in the empty case it might happen at any position.

Instead of

"^[a-zA-Z] [-.'\\s]?[-a-zA-Z]*$"

try

"^[a-zA-Z] ([-.'\\s][-a-zA-Z]*)?$"

This will enter the part starting with interpunction only when there is a match.

In general do a microbenchmark (with a benchmark library), as things might not be so clear.

However regex will remain costly.

  • Related