I am trying to write a Bash script that checks and returns IDs of rows in CSV that fail certain criteria. A sample CSV is like below, I am thinking the [ -z {$CATEGORY} ] menthod to identify null value cell in CATEGORY column of the CSV. However, it seem that my if statement is not catching the null value in the CSV, hence need help
ID,DATE,PRODUCT CODE,CATERGORY
1,01/01/2000,10009,1
2,02/01/2000,9999,2
3,25/01/2000,1009,3
4,15/09/2000,2001,5
5,09/25/2000,2003,4
6,09/10/01,2091,P
7,20/02/2002,3098,6
8,01/03/2003,4097,3
9,03/04/2004,5000,2
10,05/02/2013,4000,1
11,10/01/2015,9,
This is my bash script code, the null value is in the row with ID = 11
#!/bin/bash
FILE=${1}
IFS=$'\n'
((c=-1))
for row in $(cat $FILE)
do
((c ))
if ((c==0))
then
continue
fi
IFS=','
read ID DATE PRODUCT CATEGORY <<<${row}
if [ -z {$CATEGORY} ];
then
echo "$ID" >> file.txt
fi
done
CodePudding user response:
-z {$CATEGORY}
should be -z ${CATEGORY}
, but read ID ... <<< ${row}
will assign only ID
... Try:
#!/bin/bash
while IFS=, read -r ID DATE PRODUCT CATEGORY; do
if [[ "$CATEGORY" =~ ^[[:space:]]*$ ]]; then
echo "$ID"
fi
done < <( tail -n 2 "$1" ) > file.txt
Note that awk
or sed
would be much faster and simpler for this (see, for instance, https://mywiki.wooledge.org/DontReadLinesWithFor). Example with awk
(tested with recent BSD and GNU awk
):
awk -F, 'NR>1 && $NF ~ /^[[:space:]]*$/ {print $1}' "$FILE" > file.txt
Example with sed
(tested with recent BSD and GNU sed
):
sed -En 's/^([^,]*).*,[[:space:]]*$/\1/p' "$FILE" > file.txt