I am trying to convert this input from file.txt
a,b;c^d"e}
f;g,h!;i8j-
into this output
a,b,c,d,e,f,g,h,i,j
with awk
The best I did so far is
awk '$1=$1' FS="[!;^}8-]" OFS="," file.txt
- how can I escape interpritating
"
as a special character ?"
doesn`t work - avoid duplicate
,,
in the output and delete the last,
CodePudding user response:
If you only want to replace non-letter characters with commas and squeeze repeated commas, tr
is your friend:
tr -sc '[:alpha:]' ','
This will leave a trailing comma though. You could use sed
to remove/replace it:
tr -sc '[:alpha:]' ',' | sed 's/,$/\n/'
Another possibility is to split each "item" into its own line (with tr
or grep -o
), then use paste
to combine the lines again:
tr -sc '[:alpha:]' '\n' | paste -sd,
CodePudding user response:
I would harness GNU AWK
for this task following way, let file.txt
content be
a,b;c^d"e} f;g,h!;i8j-
then
awk 'BEGIN{FPAT="[a-z]";OFS=","}{$1=$1;print}' file.txt
gives output
a,b,c,d,e,f,g,h,i,j
Explanation: I inform GNU AWK
that field is single lowercase ASCII letter using FPAT
, and output field separator (OFS
) is ,
, then for each line I do $1=$1
to trigger line rebuild and print
line.
(tested in GNU Awk 5.0.1)
CodePudding user response:
$ awk -v RS="^$" '{ # read whole file in
gsub(/[^a-z] /,",") # replace all non lowercase alphabet substrings with a comma
sub(/,$/,"") # remove trailing comma
}1' file # output
Output:
a,b,c,d,e,f,g,h,i,j