I am referring to this sample to do a unique string count. Batch file to count occurrences However, my string may include special characters eg. "orange c=US". If the string has special characters, the count won't work.
input:
[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,apple c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,apple c=CA,xxxx,xxxx
output:
orange c=US 3
orange c=CA 2
apple c=US 1
apple c=CA 1
code:
set "file=test.out.log"
(
for /f "tokens=1,2,3,4,5,6,7 delims=," %%a in ('findstr /I /N /C:"[SUCCESS]" "%file%"') do (
set "t=%%d" <--- %%d will extract "orange c=US"
call :handleType
)
rem Enumerate find types and echo type and number of occurrences
rem The inner loop is to allow underscores inside type
for /f "tokens=1,* delims=_" %%a in ('set _type_ 2^>nul') do (
for /f "tokens=1,2 delims==" %%v in ("%%b") do (
echo %%v %%w
)
)) > output.txt
rem Clean and exit
endlocal
exit /b
pause > nul
:handleType
rem %t% ----> orange c=US
set "t=%t:'=%"
for /f "tokens=*" %%t in ("%t:"=%") do (
set /a "_type_%%~t =1"
)
goto :EOF
CodePudding user response:
Just replace the special character =
(and the space) by another one for the count, and replace back such characters for the output:
EDIT: Code modified as requested in a posterior comment
@echo off
setlocal EnableDelayedExpansion
rem Count items
for /F "tokens=4 delims=," %%d in ('findstr /I /L "[SUCCESS]" test.txt') do (
set "item=%%d"
rem Replace special characters
for %%a in (" =PLUS" "/=SLASH") do (
for /F "tokens=1,2 delims==" %%b in (%%a) do set "item=!item:%%b=%%c!"
)
rem Separate on SPace and equal-sign characters
for /F "tokens=1,2,3 delims== " %%x in ("!item!") do (
set /A "count[%%x_%%y_%%z] =1"
)
)
REM set count[
rem Show counts
for /F "tokens=2-5 delims=[_]=" %%a in ('set count[') do (
rem Replace back special characters
set "item=%%a"
for %%a in ("PLUS= " "SLASH=/") do (
for /F "tokens=1,2 delims==" %%b in (%%a) do set "item=!item:%%b=%%c!"
)
echo !item! %%b=%%c %%d
)
Input:
[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,orange c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,apple c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,apple c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,Grape c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,Grape c=CA,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,Grape/L c=US,xxxx,xxxx
[SUCCESS] xxxx,xxxx,xxxx,Grape/L c=CA,xxxx,xxxx
Output:
apple c=CA 1
apple c=US 1
Grape c=CA 1
Grape c=US 1
Grape/L c=CA 1
Grape/L c=US 1
orange c=CA 2
orange c=US 3
CodePudding user response:
Here's an alternative method available from Windows 10 onwards, which may assist you, (subject to whatever secret 'special characters' exist within the target field of your CSV input file):
@Echo Off
SetLocal EnableExtensions DisableDelayedExpansion
Set "ifile=inputfile.csv"
Set "tfile=%TEMP%\$.log"
For /F "Tokens=4 Delims=," %%G In ('%SystemRoot%\System32\findstr.exe /I /R
/C:"^\[SUCCESS\][^,][^,]*,[^,][^,]*,[^,][^,]*,[^,][^,]*" "%ifile%" 2^>NUL'
) Do (Echo %%G) 1>>"%tfile%"
If Not Exist "%tfile%" GoTo :EOF
For /F Delims^=^ EOL^= %%G In ('%SystemRoot%\System32\sort.exe /Unique "%tfile%"
') Do (SetLocal EnableDelayedExpansion & Set "}=0"
For /F Delims^=^ EOL^= %%H In ('%SystemRoot%\System32\findstr.exe /I /L /X
/C:"%%G" "%tfile%"') Do Set /A } = 1
Echo %%G,!}!& EndLocal)
Del "%tfile%"
Pause
Expected output, (I used a comma delimiter here, as those should not exist in the field string):
apple c=CA,1
apple c=US,1
orange c=CA,2
orange c=US,3
[EDIT /]
It should have no difficulties with your additional example strings Grape c=US
, and Grape/L c=CA
from your comment to another answer.