I am trying to create a script which detects if files in a directory have not UTF-8 characters and if they do, grab the file type of that particular file and perform the iconv
operation on it.
The code is follows
find <directory> |sed '1d'><directory>/filelist.txt
while read filename
do
file_nm=${filename%%.*}
ext=${filename#*.}
echo $filename
q=`grep -axv '.*' $filename|wc -l`
echo $q
r=`file -i $filename|cut -d '=' -f 2`
echo $r
#file_repair=$file_nm
if [ $q -gt 0 ]; then
iconv -f $r -t utf-8 -c ${file_nm}.${ext} >${file_nm}_repaired.${ext}
mv ${file_nm}_repaired.${ext} ${file_nm}.${ext}
fi
done< <directory>/filelist.txt
While running the code, there are several files that turn into 0 byte files and .bak gets appended to the file name.
ls| grep 'bak' | wc -l
36
Where am I making a mistake?
Thanks for the help.
CodePudding user response:
It's really not clear what some parts of your script are supposed to do.
Probably the error is that you are assuming file -i
will output a string which always contains =
; but it often doesn't.
find <directory> |
# avoid temporary file
sed '1d' |
# use IFS='' read -r
while IFS='' read -r filename
do
# indent loop body
file_nm=${filename%%.*}
ext=${filename#*.}
# quote variables, print diagnostics to stderr
echo "$filename" >&2
# use grep -q instead of useless wc -l; don't enter condition needlessly; quote variable
if grep -qaxv '.*' "$filename"; then
# indent condition body
# use modern command substitution syntax, quote variable
# check if result contains =
r=$(file -i "$filename")
case $r in
*=*)
# only perform decoding if we can establish encoding
echo "$r" >&2
iconv -f "${r#*=}" -t utf-8 -c "${file_nm}.${ext}" >"${file_nm}_repaired.${ext}"
mv "${file_nm}_repaired.${ext}" "${file_nm}.${ext}" ;;
*)
echo "$r: could not establish encoding" >&2 ;;
esac
fi
done
See also Why is testing “$?” to see if a command succeeded or not, an anti-pattern? (tangential, but probably worth reading) and useless use of wc
The grep
regex is kind of mysterious. I'm guessing you want to check if the file contains non-empty lines? grep -qa . "$filename"
would do that.