Home > Software design >  How to count unique string with special characters in window bat file
How to count unique string with special characters in window bat file

Time:10-16

I am referring to this sample to do a unique string count. Batch file to count occurrences However, my string may include special characters eg. "orange c=US". If the string has special characters, the count won't work.

input:

[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,apple c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,apple c=CA,xxxx,xxxx

output:

orange c=US  3
orange c=CA  2
apple c=US 1
apple c=CA 1

code:

set "file=test.out.log"
(
for /f "tokens=1,2,3,4,5,6,7 delims=," %%a in ('findstr /I /N /C:"[SUCCESS]" "%file%"') do (
    set "t=%%d" <--- %%d will extract "orange c=US"
    call :handleType
)

rem Enumerate find types and echo type and number of occurrences
rem The inner loop is to allow underscores inside type
for /f "tokens=1,* delims=_" %%a in ('set _type_ 2^>nul') do (
    for /f "tokens=1,2 delims==" %%v in ("%%b") do (
      echo %%v %%w
    )
)) > output.txt

rem Clean and exit
endlocal
exit /b
pause > nul
:handleType
  rem %t% ----> orange c=US
  set "t=%t:'=%"

for /f "tokens=*" %%t in ("%t:"=%") do (
    set /a "_type_%%~t =1"
)
goto :EOF

CodePudding user response:

Just replace the special character = (and the space) by another one for the count, and replace back such characters for the output:

EDIT: Code modified as requested in a posterior comment

@echo off
setlocal EnableDelayedExpansion

rem Count items
for /F "tokens=4 delims=," %%d in ('findstr /I /L "[SUCCESS]" test.txt') do (
   set "item=%%d"

   rem Replace special characters
   for %%a in (" =PLUS" "/=SLASH") do (
      for /F "tokens=1,2 delims==" %%b in (%%a) do set "item=!item:%%b=%%c!"
   )

   rem Separate on SPace and equal-sign characters
   for /F "tokens=1,2,3 delims== " %%x in ("!item!") do (
      set /A "count[%%x_%%y_%%z] =1"
   )
)

REM set count[

rem Show counts
for /F "tokens=2-5 delims=[_]=" %%a in ('set count[') do (

   rem Replace back special characters
   set "item=%%a"
   for %%a in ("PLUS= " "SLASH=/") do (
      for /F "tokens=1,2 delims==" %%b in (%%a) do set "item=!item:%%b=%%c!"
   )

   echo !item! %%b=%%c  %%d
)

Input:

[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,apple c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,apple c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,Grape  c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,Grape  c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,Grape/L c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,Grape/L c=CA,xxxx,xxxx

Output:

apple c=CA  1
apple c=US  1
Grape  c=CA  1
Grape  c=US  1
Grape/L c=CA  1
Grape/L c=US  1
orange c=CA  2
orange c=US  3

CodePudding user response:

Here's an alternative method available from Windows 10 onwards, which may assist you, (subject to whatever secret 'special characters' exist within the target field of your CSV input file):

@Echo Off
SetLocal EnableExtensions DisableDelayedExpansion
Set "ifile=inputfile.csv"
Set "tfile=%TEMP%\$.log"
For /F "Tokens=4 Delims=," %%G In ('%SystemRoot%\System32\findstr.exe /I /R
 /C:"^\[SUCCESS\][^,][^,]*,[^,][^,]*,[^,][^,]*,[^,][^,]*" "%ifile%" 2^>NUL'
) Do (Echo %%G) 1>>"%tfile%"
If Not Exist "%tfile%" GoTo :EOF
For /F Delims^=^ EOL^= %%G In ('%SystemRoot%\System32\sort.exe /Unique "%tfile%"
 ') Do (SetLocal EnableDelayedExpansion & Set "}=0"
    For /F Delims^=^ EOL^= %%H In ('%SystemRoot%\System32\findstr.exe /I /L /X
     /C:"%%G" "%tfile%"') Do Set /A }  = 1
    Echo %%G,!}!& EndLocal)
Del "%tfile%"
Pause

Expected output, (I used a comma delimiter here, as those should not exist in the field string):

apple c=CA,1
apple c=US,1
orange c=CA,2
orange c=US,3

[EDIT /]

It should have no difficulties with your additional example strings Grape c=US, and Grape/L c=CA from your comment to another answer.

  • Related