assuming i have an output/file
1,a,info
2,b,inf
3,c,in
I want to run a while loop with read
while read r ; do
echo "$r";
# extract line to $arr as array separated by ','
# call some program (e.g. md5sum, echo ...) on one item of arr
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
I would like to use readarray
and while
, but compelling alternatives are welcome too.
There is a specific way to have readarray
(mapfile
) behave correctly with process substitution, but i keep forgetting it. this is intended as a Q&A so an explanation would be nice
CodePudding user response:
Since compelling alternatives are welcome too
and assuming you're just trying to populate arr
one line at a time:
$ cat tst.sh
#!/usr/bin/env bash
while IFS=',' read -a arr ; do
# extract line to $arr as array separated by ','
# echo the first item of arr
echo "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
$ ./tst.sh
1
2
3
or if you also need each whole input line in a separate variable r
:
$ cat tst.sh
#!/usr/bin/env bash
while IFS= read -r r ; do
# extract line to $arr as array separated by ','
# echo the first item of arr
IFS=',' read -r -a arr <<< "$r"
echo "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
$ ./tst.sh
1
2
3
but bear in mind why-is-using-a-shell-loop-to-process-text-considered-bad-practice anyway.
CodePudding user response:
If the loadable builtin csv
is available/acceptable, something like:
help csv
csv: csv [-a ARRAY] string
Read comma-separated fields from a string.
Parse STRING, a line of comma-separated values, into individual fields,
and store them into the indexed array ARRAYNAME starting at index 0.
If ARRAYNAME is not supplied, "CSV" is the default array name.
The script.
#!/usr/bin/env bash
enable csv || exit
while IFS= read -r line && csv -a arr "$line"; do
printf '%s\n' "${arr[0]}"
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
- See
help enable
With bash 5.2
there is a default path for the loadables in config-top.h
which should be configurable at compile time.
BASH_LOADABLES_PATH
CodePudding user response:
The solution is readarray -t -d, arr < <(printf "%s," "$r")
The special part is < <(...)
because readarray ....
there is no proper reason to be found why it first needs a redirection arrow and then process-substitution.
Neither in tldp process-sub nor SS64 .
My final understanding is that, <(...)
opens a named pipe and readarray is waiting for it to close. By moving this in place of a file behind <
it is handled by bash as a file input and (anonymously) piped into stdin.
example:
while read r ; do
echo "$r";
readarray -t -d, arr < <(printf "%s," "$r");
echo "${arr[0]}";
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
Anyway this is just a reminder for myself, because i keep forgetting and readarray is the only place where i actually need this.
The question was also answered mostly here, here why the pipe isn't working and somewhat here, but they are difficult to find and the reasoning to comprehend.
for example the shopt -s lastpipe
solution is not clear at first, but it turns out that in bash all piped elements are usually not executed in the main shell, thus state changes have no effect on the full program. this command changes the behavior to have the last pipe element execute in main (except in an interactive shell)
shopt -s lastpipe;
while read r ; do
echo "$r";
printf "%s," "$r" | readarray -t -d, arr;
echo "${arr[0]}";
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
one alternative to lastpipe would be to do all activity in the sub shell:
while read r ; do
echo "$r";
printf "%s," "$r" | {
readarray -t -d, arr ;
echo "${arr[0]}";
}
done <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
CodePudding user response:
readarray
(mapfile
) and read -a
disambiguation
readarray
== mapfile
first:
help readarray readarray: readarray [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array] Read lines from a file into an array variable. A synonym for `mapfile'.
Then
help mapfile mapfile: mapfile [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array] Read lines from the standard input into an indexed array variable. Read lines from the standard input into the indexed array variable ARRAY, or from file descriptor FD if the -u option is supplied. The variable MAPFILE is the default ARRAY. Options: -d delim Use DELIM to terminate lines, instead of newline -n count Copy at most COUNT lines. If COUNT is 0, all lines are copied -O origin Begin assigning to ARRAY at index ORIGIN. The default index is 0 -s count Discard the first COUNT lines read -t Remove a trailing DELIM from each line read (default newline) -u fd Read lines from file descriptor FD instead of the standard input -C callback Evaluate CALLBACK each time QUANTUM lines are read -c quantum Specify the number of lines read between each call to CALLBACK ...
While read -a
:
help read read: read [-ers] [-a array] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name ...] Read a line from the standard input and split it into fields. Reads a single line from the standard input, or from file descriptor FD if the -u option is supplied. The line is split into fields as with word splitting, and the first word is assigned to the first NAME, the second word to the second NAME, and so on, with any leftover words assigned to the last NAME. Only the characters found in $IFS are recognized as word delimiters. ... Options: -a array assign the words read to sequential indices of the array variable ARRAY, starting at zero ...
Note:
Only the characters found in
$IFS
are recognized as word delimiters. Useful with-a
flag!
Create an array from a splitted string
For creating an array by splitting a string you could either:
IFS=, read -ra myArray <<<'A,1,spaced string,42'
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")
Oe use mapfile
, but as this command is intented to work of whole files, syntax is something counter-intuitive:
mapfile -td, myArray < <(printf %s 'A,1,spaced string,42')
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")
Or, if you want to avoid fork ( < <(printf...
), you have to
mapfile -td, myArray <<<'A,1,spaced string,42'
myArray[-1]=${myArray[-1]%$'\n'}
declare -p myArray
declare -a myArray=([0]="A" [1]="1" [2]="spaced string" [3]="42")
This will be a little quicker, but not more readable...
For you sample:
mapfile -t rows <<HEREDOC
1,a,info
2,b,inf
3,c,in
HEREDOC
for row in ${rows[@]};do
IFS=, read -a cols <<<"$row"
declare -p cols
done
declare -a cols=([0]="1" [1]="a" [2]="info")
declare -a cols=([0]="2" [1]="b" [2]="inf")
declare -a cols=([0]="3" [1]="c" [2]="in")
for row in ${rows[@]};do
IFS=, read -a cols <<<"$row"
printf ' %s | %s\n' "${cols[0]}" "${cols[2]}"
done
1 | info
2 | inf
3 | in
Or even, if really you want to use readarray
:
for row in ${rows[@]};do
readarray -dt, cols <<<"$row"
cols[-1]=${cols[-1]%$'\n'}
declare -p cols
done
declare -a cols=([0]="1,a,info")
declare -a cols=([0]="2,b,inf")
declare -a cols=([0]="3,c,in")
Playing with callback
option:
(Added some spaces on last line)
testfunc() {
local IFS array cnt line
read cnt line <<< "$@"
IFS=,
read -a array <<< "$line"
printf ' [=]: %3s | %3s :: %s\n' $cnt "${array[@]}"
}
mapfile -t -C testfunc -c 1 <<HEREDOC
1,a,info
2,b,inf
3,c d,in fo
HEREDOC
[ 0]: 1 | a :: info
[ 1]: 2 | b :: inf
[ 2]: 3 | c d :: in fo
Same, with -u
flag:
Open the file descriptor:
exec {mydoc}<<HEREDOC
1,a,info
2,b,inf
3,c d,in fo
HEREDOC
Then
mapfile -u $mydoc -C testfunc -c 1
[ 0]: 1 | a :: info
[ 1]: 2 | b :: inf
[ 2]: 3 | c d :: in fo
And finally close the file descriptor:
exec {mydoc}<&-
About bash csv
module,
For further informations about enable -f /path/to/csv csv
, RFCs and limitations, have a look at my previous post about How to parse a CSV file in Bash?