Home > OS >  Batch: Iterate .csv file columns
Batch: Iterate .csv file columns

Time:03-03

I have this .csv file:

col0,col1,col2,col3,col4
a,1,10,100,1000
b,2,11,101,1001
c,3,12,102,1002
d,4,13,103,1003
e,5,14,105,1004

I need to iterate each column in the .csv without knowing the number of columns. First column is skipped because is not needed. I have this code so far, but I need a solution for the case where I don't know the number of columns. I need the value of each column in a later step where I calculate something.

@echo off
setlocal enableDelayedExpansion
:: set workspace data
set INPUT_FILE_LOCATION=D:\Scripts\
set CSV_FILE_NAME=test.csv

pushd %INPUT_FILE_LOCATION%
::loop through the csv file
for /F "tokens=2,3,4,5 delims=," %%i in (%CSV_FILE_NAME%) do (
echo %%i,%%j,%%k,%%l
rem echo.%%~i^|END
)
endlocal 

To be more specific, I have a .csv file, with some columns and many many rows. Starting with the second column, I will need to make the difference of every two elements of every column to verify if there is at least one difference greater than 1.(The values on columns are going to be in ascending order, so as an example using the csv above, the code should do the following: starting on col1, verify if 2-1 > 1, then if 3-2 > 1, then if 4-3 > 1 then 5-4 > 1, then it should verify the same thing for the next column(col2) and so on, until we reach the last column. If I will find one difference greater than 1, I want to print a message that "a bigger difference was found on" the header of that column where the bigger difference was found; Somehow I want to localize in which column was found the unexpected difference by using the title of the column from the header; for example, in col3, we have a difference greater than 1 and I want to print "there's a difference greater than 1 in col3", where col3 is in the header). In time, I will need to add some more columns, so the file could have 30 or 40 columns with the same structure like the previous ones.

CodePudding user response:

@ECHO OFF
SETLOCAL
rem The following settings for the source directory, filenames are names
rem that I use for testing and deliberately include names which include spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.

SET "sourcedir=u:\your files"
SET "filename1=%sourcedir%\q71308045.txt"

:: comma-separated list of columns to ignore
SET "ignorecolumns=1"

:: remove all 'cell' variables from environment
For %%b IN (cell) DO FOR  /F "delims==" %%c In ('set %%b 2^>Nul') DO SET "%%c="

SET /a rowcount=0
SET /a maxcolumns=0



rem usebackq should be omitted if the source filename is not quoted
rem skip=1 skips the first (header) line. Omit to skip no lines
FOR /f "usebackq skip=1 delims=" %%b IN ("%filename1%") DO (
 CALL :process %%b
 CALL :linebyline
)
ECHO %rowcount% rows, maximum %maxcolumns% columns
SET cell
GOTO :EOF

:process
SET /a rowcount =1
SET /a columns=0

:procloop
IF "%~1"=="" GOTO :eof
SET /a columns =1
IF DEFINED ignorecolumns FOR %%c IN (%ignorecolumns%) DO IF %columns%==%%c GOTO donecolumn
SET "cell[%rowcount%,%columns%]=%~1"

:donecolumn
IF %columns% gtr %maxcolumns% SET /a maxcolumns=columns
SET /a cellsinrow[%rowcount%]=%columns%
SHIFT
GOTO procloop

GOTO :eof

:: processing line-by-line if required

:linebyline
ECHO row %rowcount% has %columns% columns
GOTO :eof

Here's a generalised solution, in the absence of specifics.

Note that it does not cater for empty columns.

Each line is presented to :process as a parameter. :process counts each column and inserts it into the cells wired-array omitting any columns not required and tracks the maximum column-number found and the number of cells in each row.

Environment space is limited AFAIAA, so compensatory measures are needed if huge amounts of data are processed.

The :linebyline routine is executed for each row, so if the required processing does not need cells then rowcount could be set back to 0 within this routine, having the effect of reporting cells[1,*] for each line

--- revision following clarification

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION 
rem The following settings for the source directory, filenames are names
rem that I use for testing and deliberately include names which include spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.

SET "sourcedir=u:\your files"
SET "filename1=%sourcedir%\q71308045.txt"

:: comma-separated list of columns to ignore
SET "ignorecolumns=1"

:: remove all 'cell' variables from environment
For %%b IN (cell) DO FOR  /F "delims==" %%c In ('set %%b 2^>Nul') DO SET "%%c="

SET /a rowcount=0
SET /a maxcolumns=0

rem usebackq should be omitted if the source filename is not quoted
rem skip=1 skips the first (header) line. Omit to skip no lines
FOR /f "usebackq delims=" %%b IN ("%filename1%") DO (
 CALL :process %%b
 CALL :linebyline
)
rem ECHO %rowcount% rows, maximum %maxcolumns% columns
rem SET cell
GOTO :EOF

:process
SET /a rowcount =1
SET /a columns=0

:procloop
IF "%~1"=="" GOTO :eof
SET /a columns =1
IF DEFINED ignorecolumns FOR %%c IN (%ignorecolumns%) DO IF %columns%==%%c GOTO donecolumn
SET "cell[%rowcount%,%columns%]=%~1"

:donecolumn
IF %columns% gtr %maxcolumns% SET /a maxcolumns=columns
SET /a cellsinrow[%rowcount%]=%columns%
SHIFT
GOTO procloop

GOTO :eof

:: processing line-by-line if required

:linebyline
:: if rowcount=1 then column names are in cell[1,*] and nothing to do
:: if rowcount=2 then we have the starting data row and nothing to do
IF %rowcount% lss 3 GOTO :eof

:: Now we can compare row 2 to row %rowcount%
FOR /L %%c IN (1,1,%maxcolumns%) DO IF "!cell[2,%%c]!" neq "" CALL :matchcells %%c
:: And move row %rowcount% to row 2; removing row %rowcount% from environment
FOR /L %%c IN (1,1,%maxcolumns%) DO IF "!cell[2,%%c]!" neq ""  SET cell[2,%%c]=!cell[%rowcount%,%%c]!&SET "cell[%rowcount%,%%c]="
GOTO :eof

:: Match cell[2,%1] to cell[%rowcount%,%1]

:matchcells
SET /a celldiff = !cell[%rowcount%,%1]! - !cell[2,%1]!
IF            
  • Related