I have a file named fulldata.tmp which contains pipe delimited data (I can change it to comma if needed but generally like using pipe). With a BASH Shell script I would like to split lines out to new files based on the value in column 1 and retain the header. I'm pulling this data via SQL so I can pre-sort if needed but I don't have direct access to the terminal running this script so development and debugging is difficult. I've searched dozens of examples mostly recommending awk but I'm not connecting the dots. This is my core need and below are a couple quality of life options I'd like if it's easy along with example data.
Nice if possible: I would like to specify which columns print to the new files (my example desired output shows I want columns 1-4 out of the initial 5 columns).
Nice if possible: I would like the new files named with a prefix then the data that is being split on followed by extension: final_$col1.csv
GROUPID|LABEL|DATE|ACTIVE|COMMENT
ABC|001|2022-09-15|True|None
DEF|001|2022-09-16|False|None
GHI|002|2022-10-17|True|Future
final_ABC.csv
ABC|001|2022-09-15|True
final_DEF.csv
DEF|001|2022-09-16|False
final_GHI.csv
GHI|002|2022-10-17|True
CodePudding user response:
Maybe awk
awk -F'|' -v OFS='|' 'NR>1{print $1, $2, $3, $4 > "final_"$1".csv"}' fulldata.tmp
Check the created csv files and it's content.
tail -n 1 final*.csv
Output
==> final_ABC.csv <==
ABC|001|2022-09-15|True
==> final_DEF.csv <==
DEF|001|2022-09-16|False
==> final_GHI.csv <==
GHI|002|2022-10-17|True
CodePudding user response:
rquery (https://github.com/fuyuncat/rquery/releases) can do this straightforward
[ rquery]$ cat samples/file1.csv
GROUPID|LABEL|DATE|ACTIVE|COMMENT
ABC|001|2022-09-15|True|None
DEF|001|2022-09-16|False|None
GHI|002|2022-10-17|True|Future
[ rquery]$ ./rq -n -i "|" -q "p d/\|/ | s @1,@2,@3,@4 | >'/tmp/file_' @1 '.csv'" samples/file1.csv
[ rquery]$ cat /tmp/file_ABC.csv
ABC|001|2022-09-15|True
[ rquery]$ cat /tmp/file_DEF.csv
DEF|001|2022-09-16|False
[ rquery]$ cat /tmp/file_GHI.csv
GHI|002|2022-10-17|True
CodePudding user response:
This works and preserves the header which I believe was a requirement.
cut -d '|' -f 1 fulldata.tmp | grep -v GROUPID | sort -u | while read -r id; do grep -E "^${id}|^GROUPID" fulldata.tmp > final_${id}.csv; done
I think a pure awk
solution is better though.