Home > Software design >  How can I sort output from variable?
How can I sort output from variable?

Time:04-06

I want to be able to sort an input csv file that is comma separated by a values created in an extra column. Below is a sample of the input csv file

Timestamp,Email,Name,Year,Make,Model,Car_ID,Judge_ID,Judge_Name,Racer_Turbo,Racer_Supercharged,Racer_Performance,Racer_Horsepower,Car_Overall,Engine_Modifications,Engine_Performance,Engine_Chrome,Engine_Detailing,Engine_Cleanliness,Body_Frame_Undercarriage,Body_Frame_Suspension,Body_Frame_Chrome,Body_Frame_Detailing,Body_Frame_Cleanliness,Mods_Paint,Mods_Body,Mods_Wrap,Mods_Rims,Mods_Interior,Mods_Other,Mods_ICE,Mods_Aftermarket,Mods_WIP,Mods_Overall
8/5/2018 14:10,[email protected],Hernando,2015,Acura,TLX,48,J04,Bob,0,0,2,2,4,4,0,2,4,4,2,4,2,2,2,2,2,0,4,4,4,6,2,0,4
8/5/2018 15:11,[email protected],Noel,2015,Jeep,Wrangler,124,J02,Carl,0,6,4,2,4,6,6,4,4,4,6,6,6,6,6,4,6,6,6,6,6,4,6,4,6
8/5/2018 17:10,[email protected],Edan,2015,Lexus,Is250,222,J05,Adrian,0,0,0,0,0,0,0,0,6,6,6,0,0,6,6,6,0,0,0,0,0,0,0,0,4
8/5/2018 17:34,[email protected],Hieronymus,1993,Honda,Civic eG,207,J06,Aaron,0,0,2,2,2,2,2,2,0,4,2,2,2,2,2,2,4,2,2,0,0,0,2,2,0
8/5/2018 14:30,[email protected],Nickolas,2016,Ford,Mystang,167,J02,Carl,0,0,2,2,0,2,2,0,0,0,0,2,0,2,2,2,0,0,2,0,0,0,0,0,2
8/5/2018 16:12,[email protected],Martin,2013,Hyundai,Gen coupe,159,J04,Bob,0,0,2,0,0,0,2,0,0,0,0,2,0,2,2,0,2,0,2,0,0,0,0,0,0
8/5/2018 17:00,[email protected],Aldridge,2009,Infiniti,G37,20,J06,Aaron,2,0,2,2,0,0,2,0,0,2,2,2,2,2,2,2,2,2,4,2,2,0,2,0

What my code currently does is sift through the csv file, and pick out the car_id column, year, make, and model columns. Then it runs through every column from racer_turbo to the last, and for each row it adds up the values in those columns into a total value and prints that along side the other values (id, make, model, etc.). There is also a ranking column that precedes the other 5 when printed. Here is my code below.

BEGIN {
    FS = ",";
    OFS = "\t";

    print "Ranking", "Car_ID", "Year", "Make", "Model", "Total";
}
{
    rank;
    total = 0;

    if(NR > 1) {
        for(i = 8; i < NF; i  ) {
            total  = $i;
        }   
        print   rank,$7, $4, $5, $6, total;
    }

    rows[$5][total][$0]
}
END {
    print "\n";
    print "Ranking", "Car_ID", "Year", "Make", "Model", "Total";
    ranking;

    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (m in rows) {
        n = asorti(rows[m], t, "@ind_num_desc");
        n = (n>3) ? 3 : n
        for(i = 1; i <= n; i  ) for(s in rows[m][t[i]]) {
            $0 = s;
            $1 =   r;
            print   ranking, $7, $4, $5, $6, total;
        }
    }   
}

What I would like to do in the END block is print the output again, however, rank the cars by top three from each make using the total column which was created in the preceding block of the code. However, what I run my code now the output looks as follows

Ranking Car_ID  Year    Make    Model   Total
1   48  2015    Acura   TLX 58
2   124 2015    Jeep    Wrangler    118
3   222 2015    Lexus   Is250   36
4   207 1993    Honda   Civic eG    40
5   167 2016    Ford    Mystang 18
6   159 2013    Hyundai Gen coupe   14
7   20  2009    Infiniti    G37 36
...

Ranking Car_ID  Year    Make    Model   Total
1   113 2012    Acura   Tsx sportwagon  10
2   112 2008    Acura   TL  10
3   50  2015    Acura   TLX 10
4   15  2014    Audi    S4  10
5   18  2015    Audi    S3  10
6   116 2008    Audi    A4  10
7   2   2016    Bmw M2  10
8   172 2014    Bmw 4   10
9   28  1995    Bmw 318xi   10
...

See how in the total column on the second printed section it shows total is 10 for each printed car, instead of being the same values as they were in the first printed section for each respective car, and the highest 3 totals for each make being displayed.

Below is the expected output

Ranking Car_ID  Year    Make    Model   Total
1       48      2015    Acura   TLX     58                                                                                                                                                                                                                                                      
2       124     2015    Jeep    Wrangler        118                                                                                                                         
3       222     2015    Lexus   Is250   36                                                                                                                                  
4       207     1993    Honda   Civic eG        40                                                                                                                          
5       167     2016    Ford    Mystang 18                                                                                                                                  
6       159     2013    Hyundai Gen coupe       14                                                                                                                          
7       20      2009    Infiniti        G37     36                                                                                                                          
8       178     2009    Honda   Oddesy  66 
...

Ranking Car_ID  Year    Make    Model   Total
1       112     2008    Acura   TL      110
2       50      2015    Acura   TLX     102
3       127     2013    Acura   Tsx     86
4       15      2014    Audi    S4      120
5       18      2015    Audi    S3      38
6       116     2008    Audi    A4      28
7       2       2016    Bmw     M2      24 
8       172     2014    Bmw     4       22
9       111     2007    Bmw     328i    10
10      218     2010    Chevy   Camaro  64
11      170     2014    Chevy   Cruze   50
12      0       2015    Chevy   Camaro  0
...

Is this salvagable with my current code? Or would a better approach be to create a separate awk file that will sort through the generated output and produce another file that is sorted by the top 3?

I'm running GNU AWK v4.0.2.

CodePudding user response:

Assuming the Car_ID (hereinafter referred to as id) is unique across the rows, would you please try:

BEGIN {
    FS = ","
    OFS = "\t"

    print "Ranking", "Car_ID", "Year", "Make", "Model", "Total"
}
{
    rank
    total = 0

    if (NR > 1) {
        for (i = 8; i < NF; i  ) {
            total  = $i
        }   
        print   rank, $7, $4, $5, $6, total
        ttl[$5][$7] = total
        row[$7] = $0
    }
}
END {
    print "\n"
    print "Ranking", "Car_ID", "Year", "Make", "Model", "Total"
    ranking
    id

    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (m in ttl) {
        n = asorti(ttl[m], t, "@val_num_desc")
        n = (n>3) ? 3 : n
        for (i = 1; i <= n; i  ) {
            id = t[i]
            total = ttl[m][id]
            $0 = row[id]
            print   ranking, $7, $4, $5, $6, total
        }
    }
}

I have slightly modified the data structure, assigning the id as the main key. Then created a 2-D array ttl, which holds the value total keyed by make and id. In the END loop, we can retrieve the input data using the id.
As a side note, your original data structure uses total as an index. If multiple rows with the same make happen to have the same value of total, either of the indexes will be overwritten.

  • Related