I have a bash script which does a lot of string manipulations. As I know, reading from a file is slow. So instead of doing it every time I need its contents, I read the whole file at the beginning of the script
readarray -t lines < "$filename"
But every time I need to feed the lines to a program which accepts input (e.g., awk
, cut
, grep
), I anyway have to print them and create a pipeline. Here's an example which finds the first line which contains a colon in a file
line=$(printf -- '%s\n' "${lines[@]}" | grep -n -m 1 :)
So I started wondering, didn't I just make it slower by making additional calls to echo
and creating a pipeline? What's the best way to handle this situation?
CodePudding user response:
If what you want is to see which is faster, you can try to use time
:
time readarray -t lines < "$filename"
time line=$(printf -- '%s\n' "${lines[@]}" | grep -n -m 1 :)
That will give you the time taken in milliseconds, and will let you see which one is faster.
CodePudding user response:
You can use the bash-specific <<<
operator to pipe variables into commands without echo
/printf
-ing them:
λ printf "test\nline\n" > file
λ cat file
test
line
λ readarray -t lines < file
λ wc -c <<< "${lines[0]}"
5
λ printf "%s" "${lines[0]}"
test
Also instead of reading the file into a variable you could consume it directly with something like this assuming you dont need all the contents at once:
while read -r line; do
grep -n -m1 ':' <<< "$line" && {
echo "Got colon"
break
}
done < filename