write regex for PMD custom rule-CodePudding

i created the following custom rule for PMD but when i run it, i get an error. if i replace the regex with a trivial regex like "a", it works. cannot understand what's wrong.

<?xml version="1.0"?>

<ruleset name="Custom Rules"
    xmlns="http://pmd.sourceforge.net/ruleset/2.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://pmd.sourceforge.net/ruleset/2.0.0 https://pmd.sourceforge.io/ruleset_2_0_0.xsd">

    <rule name="LongMethodName"
        language="java"
        message="Method name too long"
        >
        <description>
            Method name should be composed by less that five words
        </description>
        <priority>4</priority>
        <properties>
            <property name="version" value="2.0" />
            <property name="xpath">
                <value>
                    <![CDATA[
                    //MethodDeclaration[count(tokenize(@Name, '(?<=[a-z])(?=[A-Z])'))   1 > 5]
                    ]]>
                </value>
            </property>
        </properties>
    </rule>

</ruleset>

the error i get is the following. get an error for each file in the project i'm analyzing


Nov 04, 2022 10:45:42 PM net.sourceforge.pmd.RuleSet apply
WARNING: Exception applying rule LongMethodName on file /Users/francescobresciani/MSDE/1sem/software-design-modeling/sdem-ass2/fastjson-master/src/main/java/com/alibaba/fastjson/parser/SymbolTable.java, continuing with next rule
java.lang.RuntimeException: net.sf.saxon.trans.XPathException: Error at character 1 in regular expression "(?<=[a-z])(?=[A-Z])": expected ())
        at net.sourceforge.pmd.lang.rule.xpath.SaxonXPathRuleQuery.initializeXPathExpression(SaxonXPathRuleQuery.java:272)
        at net.sourceforge.pmd.lang.rule.xpath.SaxonXPathRuleQuery.evaluate(SaxonXPathRuleQuery.java:113)
        at net.sourceforge.pmd.lang.rule.XPathRule.evaluate(XPathRule.java:176)
        at net.sourceforge.pmd.lang.rule.XPathRule.apply(XPathRule.java:158)
        at net.sourceforge.pmd.RuleSet.apply(RuleSet.java:670)
        at net.sourceforge.pmd.RuleSets.apply(RuleSets.java:163)
        at net.sourceforge.pmd.SourceCodeProcessor.processSource(SourceCodeProcessor.java:209)
        at net.sourceforge.pmd.SourceCodeProcessor.processSourceCodeWithoutCache(SourceCodeProcessor.java:118)
        at net.sourceforge.pmd.SourceCodeProcessor.processSourceCode(SourceCodeProcessor.java:100)
        at net.sourceforge.pmd.SourceCodeProcessor.processSourceCode(SourceCodeProcessor.java:62)
        at net.sourceforge.pmd.processor.PmdRunnable.call(PmdRunnable.java:89)
        at net.sourceforge.pmd.processor.PmdRunnable.call(PmdRunnable.java:30)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: net.sf.saxon.trans.XPathException: Error at character 1 in regular expression "(?<=[a-z])(?=[A-Z])": expected ())
        at net.sf.saxon.java.JRegularExpression.<init>(JRegularExpression.java:70)
        at net.sf.saxon.java.JavaPlatform.compileRegularExpression(JavaPlatform.java:198)
        at net.sf.saxon.functions.Matches.tryToCompile(Matches.java:218)
        at net.sf.saxon.functions.Tokenize.maybePrecompile(Tokenize.java:45)
        at net.sf.saxon.functions.Tokenize.simplify(Tokenize.java:36)
        at net.sf.saxon.expr.ExpressionVisitor.simplify(ExpressionVisitor.java:159)
        at net.sf.saxon.expr.FunctionCall.simplifyArguments(FunctionCall.java:100)
        at net.sf.saxon.expr.FunctionCall.simplify(FunctionCall.java:88)
        at net.sf.saxon.expr.ExpressionVisitor.simplify(ExpressionVisitor.java:159)
        at net.sf.saxon.expr.BinaryExpression.simplify(BinaryExpression.java:45)
        at net.sf.saxon.expr.ArithmeticExpression.simplify(ArithmeticExpression.java:42)
        at net.sf.saxon.expr.ExpressionVisitor.simplify(ExpressionVisitor.java:159)
        at net.sf.saxon.expr.BinaryExpression.simplify(BinaryExpression.java:45)
        at net.sf.saxon.expr.ExpressionVisitor.simplify(ExpressionVisitor.java:159)
        at net.sf.saxon.expr.FilterExpression.simplify(FilterExpression.java:130)
        at net.sf.saxon.expr.ExpressionVisitor.simplify(ExpressionVisitor.java:159)
        at net.sf.saxon.expr.SlashExpression.simplify(SlashExpression.java:122)
        at net.sf.saxon.expr.ExpressionVisitor.simplify(ExpressionVisitor.java:159)
        at net.sf.saxon.expr.ExpressionTool.make(ExpressionTool.java:74)
        at net.sf.saxon.sxpath.XPathEvaluator.createExpression(XPathEvaluator.java:167)
        at net.sourceforge.pmd.lang.rule.xpath.SaxonXPathRuleQuery.initializeXPathExpression(SaxonXPathRuleQuery.java:269)

i tested the regex on regex101 and it works. i tested the XPath expression on xpather and it looks valid i tested the XPath expression on freeformatter and it looks NOT valid. it says: Unable to perform XPath operation. Syntax error at char 1 in regular expression: No expression before quantifier

the following is the snippet i checked the XPath rule against

<root>
 <MethodDeclaration Name="shortName"/>
 <MethodDeclaration Name="thisMethodNameIsVeryVeryLong"/>
</root>

the following is the exact string i input in xpather and freeformatter //MethodDeclaration[count(tokenize(@Name, '(?<=[a-z])(?=[A-Z])')) 1 > 5]

CodePudding user response：

The leading character in your regular expression, (, marks the start of a group.

The next character, ?, is a "quantifier" (like * or ); it specifies how many times the preceding expression may occur (it means "either zero or one"). But there is no preceding expression.

Are you trying to match the literal character ?? If so, you should escape it like so \?.

Are you trying to match the literal character ( either zero or one time? If so, you should escape that character \( so it's not interpreted as the start of the group.

CodePudding user response：

I guess, that the Positive Lookbehind ((?<=...)) and Positive Lookahead ((?=...)) is not supported by the Regex Syntax in XPath (at least XPath 2.0). The supported Regex syntax is described here: https://www.w3.org/TR/xmlschema-2/#regexs - and the tokenize function in XPath 2.0 is here: https://www.w3.org/TR/xquery-operators/#func-tokenize - This doc also contains some more infos about the regex in XPath.

Searching for an alternative solution, I found the question Regex to split camel case

The general idea is: First replace every lower case character immediately followed by an upper case character with the same characters including a space in between and then tokenize (splitting) the resulting string by that space.

This works with the following XPath:

//MethodDeclaration[count(tokenize(string(replace(@Name, '([a-z])([A-Z])', '$1 $2')), ' '))   1 > 5]

The conversion to a string (the string(...) function call) is only required by http://xpather.com/ - but it doesn't hurt. The expression also works with PMD.

Here's the example: http://xpather.com/f9SD9WMX