Home > Blockchain >  Awk error fatal: attempt to use array in a scalar context
Awk error fatal: attempt to use array in a scalar context

Time:05-30

I need to make an awk script that allows me to calculate the standard deviation and the mean of the variable "Population" by Continent. There are different continents.

My script is as follows:

BEGIN {
    FS = ","
    Continent["Europe"];Continent["Africa"];Continent["Asia"];Continent["Latin America and the Caribbean"];Continent["Oceania"]
}
FNR>1 {
    if ($4!="" && $11!="") {
        found  
        n[$4]  
        wx[$4]  = $3
        wxx[$4]  = $3 * $3
}


}END {

    print "Continent,Mean,Deviation"
    for (i in Continent) {
        if (n[i] > 0) {
            avg[i] = wx[i] / n[i]
            var = wxx[i] / n[i] - avg[i] * avg[i]

            if (var >= 0)
                std[i] = sqrt(var)
            else
                std[i] = 0

        printf ("%s,%.2f,%.2f%\n", Continent,avg[i],std[i])
            }

     }
}



A sample of my dataset:

Country,ISO 3166-1 alpha-3 CODE,Population,Continent,Total Cases,Total Deaths,Total Cases per 1 Mil.pop,Total Deaths per 1 Mil.pop,Death percentage,Survival Percentage,No infected Percentage
Afghanistan,AFG,40462186,Asia,177827,7671,4395,190,4.31,0.42,99.56
Albania,ALB,2872296,Europe,273870,3492,95349,1216,1.28,9.41,90.47
Algeria,DZA,45236699,Africa,265691,6874,5873,152,2.59,0.57,99.41


My desired output:

Continent Mean      Deviation
Africa  42847108   3298802049
Asia    1938293848 23984033
Europe  190319838   12020492


However when I run the code as gawk -f script.awk dataset.csv, I get the error:

fatal: attempt to use array `Continent' in a scalar context

How can this be solved?

CodePudding user response:

GNU AWK is unable to print array as whole, which lead to

fatal: attempt to use array `Continent' in a scalar context

you need to use currently processed key of array, which is i in your case, that is replace

printf ("%s,%.2f,%.2f%\n", Continent,avg[i],std[i])

using

printf ("%s,%.2f,%.2f%\n",i,avg[i],std[i])

My desired output:

If you aim to have fixed width columns, you might use number after % sign to just to right and -number afte % to just to left, consider following simmple example, let file.txt content be

Able 1000
Baker 150
Charlie 200

and say you want to turn it into fixed-width format, then you might do

awk '{printf "%-10s%7.2f\n", $1, $2}' file.txt

and get output

Able      1000.00
Baker      150.00
Charlie    200.00

If you want to know more consult Modifiers for printf Formats

(tested in GNU Awk 5.0.1)

CodePudding user response:

It's important to use singulars/plurals when naming scalars/arrays, You array should be named Continents[] (plural) as it contains multiple continent names and then for (i in Continent) should be for (Continent in Continents) and then all the rest becomes obvious.

Try this (untested):

BEGIN {
    FS = ","
    split("Europe,Africa,Asia,Latin America and the Caribbean,Oceania",tmp)
    for (i in tmp) {
        Continent = tmp[i]
        Continents[Continent]
    }
}
FNR>1 {
    Continent = $4
    if (Continent !="" && $11!="") {
        found  
        cnts[Continent]  
        wxs[Continent]  = $3
        wxxs[Continent]  = $3 * $3
    }
}
END {

    print "Continent,Mean,Deviation"
    for (Continent in cnts) {
        cnt = cnts[Continent]
        wx  = wxs[Continent]
        wxx = wxxs[Continent]

        avg = wx / cnt
        var = wxx / cnt - avg * avg
        std = (var >= 0 ? sqrt(var) : 0)

        printf ("%s,%.2f,%.2f%%\n", Continent, avg, std)
     }
}

You don't need the cnt, wx, and wxx variables but I'm just showing the clarity and simplicity you get if you use plurals for array names and singulars for array content scalars.

  • Related