Goal:
I have a sneaking suspicion that I'm globbing incorrectly due to not being able to find a satisfactory explanation with multiple clear examples of advanced string-and-var mixing.
The operation I am trying to perform is on the last line, and the goal is to output the outputdirectory filebasename outputextension. Unfortunately, there are too many variables, and despite reading multiple manuals, I feel certain I am making mistakes.
#!/bin/bash
echo Input directory name like ./path/to:
read -r varin
echo Input directory name like ./path/to:
read -r varout
if [ ! -d "${varout}" ]; then
mkdir -p "${varout}";
fi
for file in ${varin}; do pconvert -i "${file}" -o "${varout}"/"${file%%.*}".txt; done
error:
File './inputs/outputs/*/.txt' already exists. Overwrite ? [y/N] ^C
Unexpected behavior:
- I have to write
./inputs/*
instead of./inputs
, and this is unexpected. I expected bash to look for a directory then loop through the files in that directory: this is fine, but it shows that I am not comprehending the code. - Presuming I type
./inputs/outputs/*
, this script tries to create./inputs/outputs/*.txt
on each iteration rather than./inputs/outputs/inputname.txt
. The goal in the last operation on line 15 is to scrub the directory, scrub the extension, and use the new path basename newextension. Kind of the blind leading the blind, but I feel like this can only have something to do with my use of quotation marks?
Resources I've used:
According to this link, I should probably do something like this:
convertdoc -i "$'{file}'" --pdfconvert -o "$'{outputDir}'/$'{file%%.*}'.odf
But I am getting mixed opinions from friends. So far, I've been told to use no trailing quote, to only use semiquotes, to use quotes both prior to and after the dollar sign, and to be pipe down, to mention a few.
Sample inputs:
$HOME/pdfdl/ardvarks.pdf
$HOME/pdfdl/ants.pdf
$HOME/pdfdl/canines.pdf
$HOME/pdfdl/cats.tmp.pdf
CodePudding user response:
Your script has a few defects. The "for" statement is not doing what you think it is. You haven't given it an expression to match/expand, so you only have a list of 1 item, namely varin, only the directory, not actual PDF files.
It isn't completely clear from your question what you are trying to convert, but the list of input filenames clarified that.
I try to use basic tools for linux so I use "pdftotext" instead of the two you mentionned above.
As for the "${file%%.*}", I prefer to make explicit some actions that implicit forms make too "arcane" for beginners/reviewing. I prefer to see the actual flow of how things are transformed, hence the use of basename in my version of your script below.
#!/bin/sh
START=`pwd`
echo "Input directory name (./path/to) => \c"
#read -r varin
varin=${START}/TESTin
echo "Input directory name (./path/to) => \c"
#read -r varout
varout=${START}/TESTout
if [ ! -d "${varout}" ]; then
mkdir -p "${varout}";
fi
cd ${varin}
if [ $? -ne 0 ] ; then echo "\n Unable to set '${varin}' as work directory for input file scanning.\n" ; exit 1 ; fi
for file in *.pdf
do
#pdfconvert -i "${file}" -o "${varout}"/"${file%%.*}".txt
BASE=`basename "${file}" ".pdf" `
#pdftotext -eol unix -nopgbrk "${file}" "${varout}/${file%%.*}.txt"
pdftotext -eol unix -nopgbrk "${file}" "${varout}/${BASE}.txt" 2>>errlog
done
CodePudding user response:
Consider using arguments. I would dp:
#!/bin/bash
varout=$1
shift
mkdir -p "$varout"
for file in "$@"; do
# https://stackoverflow.com/questions/965053/extract-filename-and-extension-in-bash
filename="${file##*/}"
filename_without_ext="${filename%.*}"
pconvert -i "$file" -o "$varout/$filename_without_ext".txt
done
And then do:
./script.sh /output/dir/ /input/*.txt
I have to write ./inputs/* instead of ./inputs, and this is unexpected. I expected bash to look for a directory then loop through the files in that directory
I do not understand your confusion. *
expands to the list of entries inside a dir. If you type ./inputs
that's just ./inputs
, when you type ./inputs/*
then on ${varin}
it expands to the list of files. I would find it unexpected if both would mean the same.
Additionally, ${file%%.*}
is invalid when the path contains another .
. It removes the longest suffix that matches .*
. When file=./anything/file.txt
then echo "${file%%.*}"
will output empty - because file=.
starts with a dot, .*
matches everything.
Presuming I type ./inputs/outputs/, this script tries to create ./inputs/outputs/.txt
No, the error message suggests it tries to create /inputs/outputs/*/.txt
.
I do not understand how would you want the output to expand a glob expression. As you stated The goal in the ... use the new path
, not multiple new paths, which *
would expand to.
According to this link, I should probably do something like this: convertdoc -i "$'{file}'" --pdfconvert -o "$'{outputDir}'/$'{file%%.*}'.odf
A quoting style "$'{something}'"
was never used in that link. Consider re-reading it.