Create a parser script with a delimiter-CodePudding

I am trying to convert this input from file.txt

a,b;c^d"e}
f;g,h!;i8j-

into this output

a,b,c,d,e,f,g,h,i,j

with awk

The best I did so far is

awk '$1=$1' FS="[!;^}8-]" OFS="," file.txt

how can I escape interpritating " as a special character ? " doesn`t work
avoid duplicate ,, in the output and delete the last ,

CodePudding user response：

If you only want to replace non-letter characters with commas and squeeze repeated commas, tr is your friend:

tr -sc '[:alpha:]' ','

This will leave a trailing comma though. You could use sed to remove/replace it:

tr -sc '[:alpha:]' ',' | sed 's/,$/\n/'

Another possibility is to split each "item" into its own line (with tr or grep -o), then use paste to combine the lines again:

tr -sc '[:alpha:]' '\n' | paste -sd,

CodePudding user response：

I would harness GNU AWK for this task following way, let file.txt content be

a,b;c^d"e} f;g,h!;i8j-

then

awk 'BEGIN{FPAT="[a-z]";OFS=","}{$1=$1;print}' file.txt

gives output

a,b,c,d,e,f,g,h,i,j

Explanation: I inform GNU AWK that field is single lowercase ASCII letter using FPAT, and output field separator (OFS) is ,, then for each line I do $1=$1 to trigger line rebuild and print line.

(tested in GNU Awk 5.0.1)

CodePudding user response：

$ awk -v RS="^$" '{      # read whole file in 
    gsub(/[^a-z] /,",")  # replace all non lowercase alphabet substrings with a comma
    sub(/,$/,"")         # remove trailing comma
}1' file                 # output

Output:

a,b,c,d,e,f,g,h,i,j