I dynamically iterate through a csv file and select columns that fit the criteria I need. My CSV is separated by commas. I save these indexes to an array that looks like
echo "${cols_needed[@]}"
1 3 4 7 8
I then need to write these columns to a new file and I've tried the following cut and awk commands, however, as the array is dynamically created, I cant seem to find the right commands that can select them all at once. I have tried cut, awk and paste commands.
awk -v fields=${cols_needed[@]} 'BEGIN{ n = split(fields,f) }
{ for (i=1; i<=n; i) printf "%s%s", $f[i], (i<n?OFS:ORS) }' test.csv
This throws an error as it cannot split the fields unless I hard code them (even then, it can only do 2), split on spaces.
fields="1 2’
I have tried to dynamically create -f parameters, but can only do so with one variable in a loop like so
for item in "${cols_needed[@]}";
do
cat test.csv | cut -f$item
done
which outputs one column at a time.
And I have tried to dynamically create it with commas - input as 1,3,4,7...
cat test.csv | cut -f${cols_needed[@]};
which also does not work!
Any help is appreciated! I understand awk does not work like bash and we cannot pass variables around in the same way. I feel like I'm going around in circles a bit! Thanks in advance.
CodePudding user response:
Your first approach is ok, just:
- change
-v fields=${cols_needed[@]}
to-v fields="${cols_needed[*]}"
, to pass the array as a single shell word - add
FS=OFS=","
to BEGIN, after splitting (you want to split on spaces, before FS is changed to,
) - ie.
BEGIN {n = split(fields, f); FS=OFS=","}
Also, if there are no commas embedded in quoted csv fields, you can use cut
:
IFS=,; cut -d, -f "${cols_needed[*]}" test.csv
If there are embedded commas, you can use gawk
's FPAT
, to only split fields on unquoted commas.
Here's an example using that.
# prepend $ to each number
for i in "${cols_needed[@]}"; do
fields[j ]="\$$i"
done
IFS=,
gawk -v FPAT='([^,] )|(\"[^\"] \")' -v OFS=, "{print ${fields[*]}}"
Injecting shell code in to an awk command is generally not great practice, but it's ok here IMO.
CodePudding user response:
Expanding on my comments re: passing the bash
array into awk
:
Passing the array in as an awk
variable:
$ cols_needed=(1 3 4 7 8)
$ typeset -p cols_needed
declare -a cols_needed=([0]="1" [1]="3" [2]="4" [3]="7" [4]="8")
$ awk -v fields="${cols_needed[*]}" 'BEGIN{n=split(fields,f); for (i=1;i<=n;i ) print i,f[i]}'
1 1
2 3
3 4
4 7
5 8
Passing the array in as a 'file' via process substitution:
$ awk 'FNR==NR{f[ n]=$1;next} END {for (i=1;i<=n;i ) print i,f[i]}' <(printf "%s\n" "${cols_needed[@]}")
1 1
2 3
3 4
4 7
5 8
As for OP's main question of extracting a specific set of columns from a .csv file ...
Borrowing dawg's .csv file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
Expanding on the suggestion for passing the bash
array in as an awk
variable:
awk -v fields="${cols_needed[*]}" '
BEGIN { FS=OFS=","
n=split(fields,f," ")
}
{ pfx=""
for (i=1;i<=n;i ) {
printf "%s%s", pfx, $(f[i])
pfx=OFS
}
print ""
}
' file.csv
NOTE: this assumes OP has provided a valid list of column numbers; if there's some doubt as to the validity of the input (column) numbers then OP can add some logic to address said doubts (eg, are they integers? are they positive integers? do they reference a field (in file.csv
) that actually exists?, etc)
This generates:
1,3,4,7,8
11,13,14,17,18
21,23,24,27,28
CodePudding user response:
Suppose you have this variable in bash:
$ echo "${cols_needed[@]}"
3 4 7 8
And this CSV file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
You can select columns of that csv file in awk this way:
awk '
BEGIN{FS=OFS=","}
FNR==NR{split($0, cols," "); next}
{
s=""
for (e=1;e<=length(cols); e )
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' <(echo "${cols_needed[@]}") file.csv
Prints:
3,4,7,8
13,14,17,18
23,24,27,28
Or, you can do:
awk -v cw="${cols_needed[*]}" '
BEGIN{FS=OFS=","; split(cw, cols," ")}
{
s=""
for (e=1;e<=length(cols); e )
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' file.csv
# same output
BTW, you can do this entirely with cut
:
cut -d ',' -f $(IFS=, ; echo "${cols_needed[*]}") file.csv
3,4,7,8
13,14,17,18
23,24,27,28