Home > Software design >  Need help to fix an awk script
Need help to fix an awk script

Time:10-03

i have a awk need to fix output

#!/usr/bin/awk -f

BEGIN {

print "Identifier,prob_id,score,comments"

FS = "," 
}

{
if (NR>1)

print "Participant",$1,$4
}

Result of this script is

Identifier,prob_id,score,comments
Participant 748737 code compiles
Participant 748737 input 101
Participant 748737 input 10011
Participant 748737 empty input
Participant 748737 bad input
Participant 748708 code compiles
Participant 748708 input 101
Participant 748708 input 10011
Participant 748708 empty input
Participant 748708 bad input
Participant 748701 code compiles
Participant 748701 input 101
Participant 748701 input 10011
Participant 748701 empty input
Participant 748701 bad input

origanl csv file data is

Identifier,prob_id,score,prob_desc
748737,1,0,code compiles
748737,2,0,input 101
748737,3,0,input 10011
748737,4,0,empty input
748737,5,0,bad input
748708,1,0,code compiles
748708,2,0,input 101
748708,3,1,input 10011
748708,4,0,empty input
748708,5,1,bad input
748701,1,0,code compiles
748701,2,0,input 101
748701,3,0,input 10011
748701,4,0,empty input
748701,5,1,bad input

Reuired output is

Identifier,prob_id,score,comments
Participant 748737,3_a,10,code compiles
Participant 748737,3_b,5,input 101
Participant 748737,3_c,5,input 10011
Participant 748737,3_d,5,empty input
Participant 748737,3_e,5,bad input
Participant 748708,3_a,10,code compiles
Participant 748708,3_b,5,input 101
Participant 748708,3_c,0,input 10011
Participant 748708,3_d,5,empty input
Participant 748708,3_e,0,bad input
Participant 748701,3_a,10,code compiles
Participant 748701,3_b,5,input 101
Participant 748701,3_c,5,input 10011
Participant 748701,3_d,5,empty input
Participant 748701,3_e,0,bad input

Note

• prob_id values in the second field should be renamed from 1-5 to 3_a, 3_b, …, 3_e • if the input score value is 1, the transformed output value should be 0, and otherwise if the input score value is 0, the transformed output values should be 10, 5, 5, 5, 5, respectively for problem ids 1 through 5.

CodePudding user response:

You are overthinking what you need to do. All you really need to do is output the lines of the original.csv file unchanged. The only caveat is that for lines (records) greater than 1, you output "Participant " as a prefix. You can do that simply using a ternary to control whether "Participant " prints based on the record number (line number) NR.

For example, all you really need is:

awk '{ print (NF>1 ? "Participant " : "") $0 }' original.csv

Example Use/Output

With your sample data in original.csv you get:

$ awk '{ print (NF>1 ? "Participant " : "") $0 }' original.csv
Identifier,prob_id,score,prob_desc
Participant 748737,1,0,code compiles
Participant 748737,2,0,input 101
Participant 748737,3,0,input 10011
Participant 748737,4,0,empty input
Participant 748737,5,0,bad input
Participant 748708,1,0,code compiles
Participant 748708,2,0,input 101
Participant 748708,3,1,input 10011
Participant 748708,4,0,empty input
Participant 748708,5,1,bad input
Participant 748701,1,0,code compiles
Participant 748701,2,0,input 101
Participant 748701,3,0,input 10011
Participant 748701,4,0,empty input
Participant 748701,5,1,bad input

If you want to write the command in script form (which from your question it appears you do), then the long-form way without a ternary just using the pattern NR == 1 to respond differently to the first record, outputting it without a prefix, you could do:

#!/usr/bin/awk -f

NR == 1 {
    print $0
    next
}
{
    print "Participant " $0
}

(same output)

CodePudding user response:

Not complete answer, but something to get you going. Basically use the awk tables to perform the mapping.

This is verbose - possible to write much more compact code, once you figure out the basic.

awk -F, '
BEGIN {
    OFS = ","
    # Lookup tables for prob_id
    probid_code[1] = "3_a"
    probid_code[2] = "3_b"
    ...  # Extend as needed
    probid_code[5] = "3_e"

    # Lookup table for score
    probid_score[1] = 10
    probid_score[2] = 5
    ... # Extended as needed
    probid_score[5] = 5

}
NR == 1 {
    print "Identifier", "prob_id", "score", "comments"
}
NR > 1 {
    participant= "participant " $1
    prob_id = probid_code[$2]
    score = $3 == 1 ? 0 : $3 == 0 ? probid_score[$2] : ""
    comments = $4
    print participant, prob_id, score, comments
}
'
  • Related