Home > OS >  CSV Formating with AWK
CSV Formating with AWK

Time:12-21

I have a script that is suppose to run autorep -j <Job Name> -q and fetch the output of the previous command and filter it out to create a CSV file.

The Script:

#!/bin/bash
    
read -p "Enter /DIR/PATH/FILENAME where you wish to copy the data: " FILENAME
echo "Enter the JOB_NAME or %SEARCHSTRING%"

while read -r i;
do
    awk '
        BEGIN {
            print "\"insert_job\",\"job_type\",\"command\",\"machine\",\"owner\",\"date_conditions\",\"condition\",\"run_calendar\",\"exclude_calendar\",\"days_of_week\",\"run_window\",\"start_times\",\"start_mins\",\"profile\",\"term_run_time\",\"watch_file\",\"watch_interval\""
        }
        /^insert_job:/ { jn="\""$2"\""; jt="\""$4"\""; cmd="\" \""; mcn="\" \""; own="\" \""; dc="\" \""; c="\" \""; rc="\" \""; ec="\" \""; dow="\" \""; ruw="\" \""; st="\" \""; sm="\" \""; prof="\" \""; trt="\" \""; wf="\" \""; wi="\" \"" }
        /^command:/ {cmd="\""$ "\""}
        /^machine:/ {mcn="\""$2"\""}
        /^owner:/ {own="\""$2"\""}
        /^date_conditions:/ {dc="\""$2"\""}
        /^condition:/ {c="\""$2"\""}
        /^run_calendar:/ {rc="\""$2"\""}
        /^exclude_calendar:/ {ec="\""$2"\""}
        /^days_of_week:/ {dow="\""$2"\""}
        /^run_window:/ {ruw="\""$2"\""}
        /^start_times:/ {gsub("\"",""); st="\""$2"\""}
        /^start_mins:/ {sm="\""$2"\""}
        /^profile:/ {prof="\""$2"\""}
        /^term_run_time:/ {trt="\""$2"\""}
        /^watch_file:/ {wf="\""$2"\""}
        /^watch_interval:/ {wi="\""$2"\""}
        /_if_terminated/{printf "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n", jn, jt, cmd, mcn, own, dc, c, rc, ec, dow, ruw, st, sm, prof, trt, wf, wi}
    ' < <(autorep -j $i -q) > $FILENAME.csv

    break
done

Depending of the type of job, the output of autorep command can be little different

Output of Autorep -j <Job Name> -q

JOB_A

insert_job: JOB_A   job_type: FW
machine: machine.server.com
owner: User
permission:
date_conditions: 0
term_run_time: 3
alarm_if_fail: 1
profile: "/tmp/.profile"
alarm_if_terminated: 1
watch_file: "/tmp/Control*"
watch_interval: 60

JOB_B

insert_job: JOB_B   job_type: CMD
command: Autorep -M ALL
machine: machine.server.com
owner: User
permission:
date_conditions: 1
days_of_week: mo,tu,we,th,fr
start_mins: 9,19,29,39,49,59
run_window: "06:00-19:00"
std_out_file: "/tmp/JOB_B.out.txt"
std_err_file: "/tmp/JOB_B.err.txt"
alarm_if_fail: 1
alarm_if_terminated: 1

If i run the above script to get the output in a CSV format then i get the below output:

Script run for JOB_A

"insert_job","job_type","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","profile","term_run_time","watch_file","watch_interval"
"JOB_A","FW"," ","machine.server.com","User","0"," "," "," "," "," "," "," ",""/tmp/.profile"","3"," "," "

Script run got JOB_B

"insert_job","job_type","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","term_run_time","watch_file","watch_interval"
"JOB_B","CMD","command: Autorep -M ALL,"machine.server.com","User","1"," "," "," ","mo,tu,we,th,fr",""06:00-19:00""," ","9,19,29,39,49,59"," "," "," "," "

For JOB_A few of the fields are not even captured and for JOB_B Command and Machine come as one entity

Expected Output:

JOB_A

"insert_job","job_type","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","profile","term_run_time","watch_file","watch_interval"
"JOB_A","FW"," ","machine.server.com","User","0"," "," "," "," "," "," "," ",""/tmp/.profile"","3",""/tmp/Control*"","60"

JOB_B

"insert_job","job_type","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","term_run_time","watch_file","watch_interval"
"JOB_B","CMD","Autorep -M ALL","machine.server.com","User","1"," "," "," ","mo,tu,we,th,fr",""06:00-19:00""," ","9,19,29,39,49,59"," "," "," "," "

CodePudding user response:

If you choose FS properly then you can process the input smoothly:

awk -F ': | {2,}' -v OFS=',' '
    BEGIN {
        nf = split("insert_job,job_type,command,machine,owner,date_conditions,condition,run_calendar,exclude_calendar,days_of_week,run_window,start_times,start_mins,profile,term_run_time,watch_file,watch_interval",header,OFS)
        for (i = 1; i <= nf; i  )
            printf "%s%s", csvescape(header[i]), (i < nf ? OFS : ORS)
    }
    {
        for ( i = 1; i < NF; i  = 2 )
            record[$i] = $(i 1)
    }
    END {
        for ( i = 1; i <= nf; i   )
            printf "%s%s", csvescape(record[header[i]]), (i < nf ? OFS : ORS)
    }

    function csvescape(str,      out) {
        out = str
        gsub(/"/,"\"\"",out)
        return "\"" out "\""
    }
'
"insert_job","job_type","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","profile","term_run_time","watch_file","watch_interval"
"JOB_A","FW","","machine.server.com","User","0","","","","","","","","""/tmp/.profile""","3","""/tmp/Control*""","60"
"insert_job","job_type","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","profile","term_run_time","watch_file","watch_interval"
"JOB_B","CMD","Autorep -M ALL","machine.server.com","User","1","","","","mo,tu,we,th,fr","""06:00-19:00""","","9,19,29,39,49,59","","","",""

remarks:

  • The output isn't exactly the expected one but it is a valid CSV
  • If you really need to output " " for the empty fields then you can add the code for it in the csvescape function.

CodePudding user response:

Addressing just OP's current code ... there are two main issues:

  • /^command:/ {cmd="\""$ "\""} - does not reference a specific field so $ is treated the same as $0 (the whole line), hence the reason the output for JOB B / JOB_B shows the whole line, ie, command: Autorep -M ALL; one solution would be to strip off the first field (plus the field delimiter) before referencing $0
  • /if_terminated/ { printf ...} - is coded with an assumption the line containing if_terminated is the last line in the file, but for JOB A / JOB_A said line is not the last line so the printf is being run before all input lines have been processed; one solution would be to delay the printf until after the entire file has been read (eg, move to an END{} block)

Rolling these changes into OP's current awk code, and adding some formatting to make easier to read:

awk '
    BEGIN               { print "\"insert_job\",\"job_type\",\"command\",\"machine\",\"owner\",\"date_conditions\",\"condition\",\"run_calendar\",\"exclude_calendar\",\"days_of_week\",\"run_window\",\"start_times\",\"start_mins\",\"profile\",\"term_run_time\",\"watch_file\",\"watch_interval\"" }

/^insert_job:/          { jn="\""$2"\""; jt="\""$4"\""; cmd="\" \""; mcn="\" \""; own="\" \""; dc="\" \""; c="\" \""; rc="\" \""; ec="\" \""; dow="\" \""; ruw="\" \""; st="\" \""; sm="\" \""; prof="\" \""; trt="\" \""; wf="\" \""; wi="\" \"" print "dbg0:" wf ":" }

/^command:/             {                   $0=substr($0,index($0,$2));
                           cmd="\""$0"\"" }
/^machine:/             {  mcn="\""$2"\"" }
/^owner:/               {  own="\""$2"\"" }
/^date_conditions:/     {   dc="\""$2"\"" }
/^condition:/           {    c="\""$2"\"" }
/^run_calendar:/        {   rc="\""$2"\"" }
/^exclude_calendar:/    {   ec="\""$2"\"" }
/^days_of_week:/        {  dow="\""$2"\"" }
/^run_window:/          {  ruw="\""$2"\"" }
/^start_times:/         {                   gsub("\"","");
                            st="\""$2"\"" }
/^start_mins:/          {   sm="\""$2"\"" }
/^profile:/             { prof="\""$2"\"" }
/^term_run_time:/       {  trt="\""$2"\"" }
/^watch_file:/          {   wf="\""$2"\"" }
/^watch_interval:/      {   wi="\""$2"\"" }

END                     { printf "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n",
                                 jn, jt, cmd, mcn, own, dc, c, rc, ec, dow, ruw, st, sm, prof, trt, wf, wi print "dbg2:" wf ":"
                        }
' 

Running this against the 2 input files generates:

"insert_job","job_type","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","profile","term_run_time","watch_file","watch_interval"
"JOB_A","FW"," ","machine.server.com","User","0"," "," "," "," "," "," "," ",""/tmp/.profile"","3",""/tmp/Control*"","60"

"insert_job","job_type","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","profile","term_run_time","watch_file","watch_interval"
"JOB_B","CMD","Autorep -M ALL","machine.server.com","User","1"," "," "," ","mo,tu,we,th,fr",""06:00-19:00""," ","9,19,29,39,49,59"," "," "," "," "
  • Related