I have a CSV file like this
>cat test.in
1|aaa|bbb
1|ccc|ddd
2|xxx|yyy
2|www|zzz
2|www|ttt
and I want to split it in separate files:
- the naming convention should be
prefix.FIELD1.FIELD2.out
- FIELD1 should not be in the output file
- every output file should have a header
Is there a neat way to do it in one go with awk
?
So far I've managed to have awk
create the output files but I can't make it add the header, so I just loop over the output files and add it afterwards
>cat script.sh
#!/bin/bash
FIELD_SEPARATOR="|"
OUTPUT_HEADER="Key|Value"
awk '{FS=OFS="'${FIELD_SEPARATOR}'"; print $2,$3> "prefix." $1 "." $2 ".out"}' test.in
# add the header to all the output files
echo $OUTPUT_HEADER > header
for filename in $(ls prefix.*.out 2>/dev/null); do
cat header $filename > $filename.tmp && mv $filename.tmp $filename
done
rm header
which gives the expected output
>ls prefix.*.out
prefix.1.aaa.out prefix.1.ccc.out prefix.2.www.out prefix.2.xxx.out
>cat prefix.1.aaa.out
Key|Value
aaa|bbb
>cat prefix.1.ccc.out
Key|Value
ccc|ddd
>cat prefix.2.www.out
Key|Value
www|zzz
www|ttt
>cat prefix.2.xxx.out
Key|Value
xxx|yyy
CodePudding user response:
NO - nononono.... Don't so this -
for filename in $(ls prefix.*.out 2>/dev/null)
Do this -
for filename in prefix.*.out
c.f. https://mywiki.wooledge.org/ParsingLs
But your question, as I read it, is basically whether you can skip that whole structure and just have awk
handle that for you as well, the the answer is yes, you certainly can, and all in one pass.
$: awk -v sep='|' 'BEGIN{FS=OFS=sep} { f="prefix."$1"."$2".out"; print "Key","Value" > f; print $2,$3 > f; }' test.in
$: grep . prefix*
prefix.1.aaa.out:Key|Value
prefix.1.aaa.out:aaa|bbb
prefix.1.ccc.out:Key|Value
prefix.1.ccc.out:ccc|ddd
prefix.2.www.out:Key|Value
prefix.2.www.out:www|zzz
prefix.2.www.out:Key|Value
prefix.2.www.out:www|ttt
prefix.2.xxx.out:Key|Value
prefix.2.xxx.out:xxx|yyy
CodePudding user response:
A simple way to do this in awk
is keep an array of the filenames created. If the filename isn't already in the array, output the header and then append your field-2 and field-3 as contents. A check that the number of fields is 3 helps ignore blank lines, etc.
You can write your script as:
awk -F"|" '
BEGIN { hdr="Key|Value"; OFS=FS }
NF==3 {
ofn="prefix." $1 "." $2 ".out"
if (! (ofn in arr)) {
print hdr > ofn
}
arr[ofn] = 1
print $2,$3 >> ofn
}
' test.in
Or if you like long 1-liners:
awk -F"|" 'BEGIN {hdr="Key|Value"; OFS=FS} NF==3 { ofn="prefix." $1 "." $2 ".out"; if (! (ofn in arr)) { print hdr > ofn } arr[ofn] = 1; print $2,$3 >> ofn }' test.in
Example Use/Output
$ awk -F"|" 'BEGIN {hdr="Key|Value"; OFS=FS} NF==3 { ofn="prefix." $1 "." $2 ".out"; if (! (ofn in arr)) { print hdr > ofn } arr[ofn] = 1; print $2,$3 >> ofn }' test.in
Result:
$ l
total 28
drwxr-xr-x 2 david david 4096 Nov 29 14:07 .
drwxr-xr-x 7 david david 4096 Nov 29 13:57 ..
-rw-r--r-- 1 david david 18 Nov 29 14:07 prefix.1.aaa.out
-rw-r--r-- 1 david david 18 Nov 29 14:07 prefix.1.ccc.out
-rw-r--r-- 1 david david 36 Nov 29 14:07 prefix.2.www.out
-rw-r--r-- 1 david david 18 Nov 29 14:07 prefix.2.xxx.out
-rw-r--r-- 1 david david 50 Nov 29 13:58 test.in
with, e.g.
$ for i in prefix*; do printf "\nfile: %s\n" "$i"; cat "$i"; done
file: prefix.1.aaa.out
Key|Value
aaa|bbb
file: prefix.1.ccc.out
Key|Value
ccc|ddd
file: prefix.2.www.out
Key|Value
www|zzz
www|ttt
file: prefix.2.xxx.out
Key|Value
xxx|yyy
A single awk
command is all you need. Let me know if you have questions.