Create an awk script to generate a new column in a dataset?-CodePudding

I am working on the following dataset and I would like to create an awk script to make an arithmetic operation between with other columns and add the result of each record in a new column that could be called "Survival Percentage" with only 2 decimals.

The operation would be the following:

((Total Cases - Total Deaths)/Population)*100

Below you can see a sample of the dataset:

Country,Other names,ISO 3166-1 alpha-3 CODE,Population,Continent,Total Cases,Total Deaths,Tot Cases//1M pop,Tot Deaths/1M pop,Death percentage
Afghanistan,Afghanistan,AFG,40462186,Asia,177827,7671,4395,190,4.313743132
Albania,Albania,ALB,2872296,Europe,273870,3492,95349,1216,1.275057509
Algeria,Algeria,DZA,45236699,Africa,265691,6874,5873,152,2.587215976
Andorra,Andorra,AND,77481,Europe,40024,153,516565,1975,0.382270638

Here is an example of the desired output after running as: gawk -F file.awk dataset.csv

Country,Other names,ISO 3166-1 alpha-3 CODE,Population,Continent,Total Cases,Total Deaths,Tot Cases//1M pop,Tot Deaths/1M pop,Death percentage, Survival Percentage
Afghanistan,Afghanistan,AFG,40462186,Asia,177827,7671,4395,190,4.313743132,0.42
Albania,Albania,ALB,2872296,Europe,273870,3492,95349,1216,1.275057509,9.41
Algeria,Algeria,DZA,45236699,Africa,265691,6874,5873,152,2.587215976,0.57
Andorra,Andorra,AND,77481,Europe,40024,153,516565,1975,0.382270638,51.54

The code I have tried to implement is as follows but it doesn't even run and it is a shell script and not an awk script as I wish.

awk  'BEGIN { FS=OFS="," } NR == 1 { $11="new" } NR > 1 { $11=(($6-$7)/$4)*100 }1' dataset.csv

From comments: After eliminating ^M as you told me, I have detected that there are rows in the "population" column that should be numerical and there is a string. Do you have any idea to discard the records that meet this condition using also awk and then perform the operation of my code? Any idea?

CodePudding user response：

Considering your samples and shown attempts here is the awk code which you can try. Which removes Control M characters(which we found out in comments) and this also checks that your 6th, 7th and 4th columns should be integers/floats etc and nothing else, try it out once.

awk '
BEGIN  { FS=OFS="," }
{ gsub(/\r/,"") }
FNR==1 { $11="new"  }
FNR > 1 && ($6 0=$6 && $7 0=$7 && $4 0=$4){
  $11=(($6-$7)/$4)*100
}
1' dataset.csv

OR to get values till 2 decimal points try something like(make use of sprintf function to save values to $11 itself:

awk '
BEGIN  { FS=OFS="," }
{ gsub(/\r/,"") }
FNR==1 { $11="new"  }
FNR > 1 && ($6 0=$6 && $7 0=$7 && $4 0=$4){
  $11=sprintf("%0.2f",(($6-$7)/$4)*100)
}
1' dataset.csv