We are trying to scan through a large library of files that have html, xml, and java files that can all include Java code for System.out.println. The issue is I need to find a specific set of examples of just that part of the code.
Example 1:
System.out.println("my job code is: " var.jobcode);
Example 2:
System.out.println("my jc is: " var.jc);
Example 3:
System.out.println("my jbc is: " var.jbc);
I have tried to get this with the following:
Get-ChildItem C:\my\folder\path -Recurse | Where-Object FullName -Match ".*C:\\my\\folder\\path*" | Where-Object FullName -Match ".*." | Select-String -Pattern '(System\.out\.println (.*?job)\/?[^)] [)]\s*;)|(System\.out\.println (.*?jc)\/?[^)] [)]\s*;)|(System\.out\.println (.*?jbc)\/?[^)] [)]\s*;){99}' -List | Select Path,Line
I got the files I wanted but I also get false positives so that files with the following lines are in the results by mistake.
System.out.println ("component printout: item"); System.out.println (""); <td style="word-break: break-all;word-wrap:break-word;font-size:12px;" align="left">Job Codes</td><td style="word-break: break-all;word-wrap:break-word;font-size:12px;" align="left">
So anytime a file has a System.out.println(); section followed by any word "job" that file gets picked up too when it shouldn't.
I have to run this over several thousand files on a semi-regular basis and need to output the file path/name and line the offending code is in.
How can I clean up this Regex to be more specific to only include files with lines like my examples above but not pickup the other files?
CodePudding user response:
Some notes about the pattern that you tried:
- You have 3 alternations, where the only difference is the word that should be present. You can use a single pattern with an alternation for those words in a non capture group instead
- Using
println
matchesprintl
followed by 1 or more times ann
char - The non greedy dot
.*?
can possibly over match, as the dot can also match"
and)
- The quantifier
{99}
repeats the whole grouping part exactly 99 times for the last alternation which seems a bit off in the pattern.
You might make the pattern a bit more specific:
System\.out\.println\("[^":]*\s(?:job|jb?c)\s[^":]*:[^"]*"[^)]*\);
Explanation
System\.out\.println\(
MatchSystem.out.println(
"[^":]*
Match"
and then optional chars other than"
and:
\s(?:job|jb?c)\s
Match eitherjob
jbc
orjc
between whitespace chars (Or use word boundaries\b(?:job|jb?c)\b
)[^":]*:[^"]*"
Optionally match any char other than"
and:
, then match:
followed by any char except"
[^)]*\);
Match optional chars other than)
, then match)
and;
See a regex demo.
An alternative without a mandatory :
and word boundaries:
System\.out\.println\("[^":]*\b(?:job|jb?c)\b[^"]*"[^)]*\);
See another regex demo.