Home > Blockchain >  Get count of exact word match within a Text file
Get count of exact word match within a Text file

Time:09-05

The requirement is to get the exact match count for a word "test". So in the following example it should be 1:

testing 1 2 3 "test" testing
Tester testing 2345 tes testers testings testing
test

I tried the below code :

(Get-Content "C:\Users\abc\Desktop\POC\Findstring.txt" | 
    Select-String -Pattern "test" -AllMatches).matches.count

But it provides me the value as 9 since it provides a like functionality (it is also considering tester,testing etc in the count).

How should we ensure that we get the count for exact match and not for a LIKE operator scenario (similar to in SQL).

CodePudding user response:

tl;dr

  • Use regex \btest\b as the -Pattern argument so as to match test as a whole word only.

  • Pass your input file path directly to Select-String's -LiteralPath parameter, which is much faster and more efficient than streaming the individual lines from the file via Get-Content.

(
  Select-String -AllMatches `
                -Pattern '\btest\b' `
                -LiteralPath C:\Users\abc\Desktop\POC\Findstring.txt 
).Matches.Count 

Note: The command is spread across multiple lines for readability. To convert it to a single-line form, also remove the line-ending ` (backtick) characters, which act as line continuations.


Your intent is to limit matching test substrings to whole words.

Since Select-String uses regexes (regular expressions), you can do so by enclosing the substring in word-boundary assertions, \b, as Theo advises, i.e. '\btest\b'

Also note that Select-String - like PowerShell in general - is case-insensitive by default; to match case-sensitively, add the -CaseSensitive switch.


Variation with also ignoring the word test when enclosed in "..."

If you additionally want to ignore "test" substrings (i.e. double-quoted instances of the word), you must amend your regex to also include a negative look-behind assertion, (?!...) in order to preclude a " preceding the word:

(
  Select-String -AllMatches `
                -Pattern '(?<!")\btest\b' `
                -LiteralPath C:\Users\abc\Desktop\POC\Findstring.txt 
).Matches.Count 

See this regex101.com page.

CodePudding user response:

Currently, you search for the pattern test which is also true in case of testing, testers, etc. The following should do the trick:

((Get-Content "C:\tmp\testdata.txt") -split " " | Select-String -Pattern '^(test)$' -AllMatches).count
  • Related