Home > Software engineering >  How to write an efficient bash script to modify a csv file
How to write an efficient bash script to modify a csv file

Time:01-02

I have created a script to modify a CSV file, but it does not work correctly:

  1. The awk -F, 'NR==1| does not print the header but the second row.

  2. Even though there are only some columns specified in while read V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V14 V15 V16 it prints out all columns of the csv.

Some sample data:

 COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),GINI INDEX (WORLD BANK ESTIMATE) AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED GALLUP","MOST PEOPLE CAN BE TRUSTED WVS ROUND 1981-1984","MOST PEOPLE CAN BE TRUSTED WVS ROUND 1989-1993","MOST PEOPLE CAN BE TRUSTED WVS ROUND 1994-1998","MOST PEOPLE CAN BE TRUSTED WVS ROUND 1999-2004","MOST PEOPLE CAN BE TRUSTED WVS ROUND 2005-2009","MOST PEOPLE CAN BE TRUSTED WVS ROUND 2010-2014"
AFGHANISTAN,2008,3.72,7.16,0.45,50.79,0.71,0.17,0.88,0.51,0.25,0.61,-1.92,-1.65,1.77,0.47,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2009,4.40,7.33,0.55,51.20,0.67,0.20,0.85,0.58,0.23,0.61,-2.04,-1.63,1.72,0.39,NULL,NULL,0.44,0.28,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2010,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2011,3.83,7.41,0.52,51.91,0.49,0.17,0.73,0.61,0.26,0.30,-1.91,-1.61,1.78,0.46,NULL,NULL,0.33,NULL,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2012,3.78,7.51,0.52,52.24,0.53,0.24,0.77,0.71,0.26,0.43,-1.84,-1.40,1.79,0.47,NULL,NULL,0.34,NULL,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2013,3.57,7.52,0.48,52.56,0.57,0.07,0.82,0.62,0.27,0.48,-1.87,-1.40,1.22,0.34,NULL,NULL,0.30,NULL,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2014,3.13,7.51,0.52,52.88,0.50,0.11,0.87,0.53,0.37,0.40,-1.77,-1.31,1.39,0.44,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2015,3.98,7.50,0.52,53.20,0.38,0.08,0.88,0.55,0.33,0.26,-1.84,-1.29,2.16,0.54,NULL,NULL,0.59,NULL,NULL,NULL,NULL,NULL,NULL,

This is my script:

#!/usr/bin/env bash

INPUT=./happiness.csv.1
OLDIFS=$IFS
IFS=','
[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }


while read V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V14 V15 V16 

do
      awk -F, 'NR==1 || /^[A-Z,a-z]*,2018/' | 
      grep '^.*-.*$' 
                                                
done < $INPUT

And this is the output:

> AFGHANISTAN,2008,3.72,7.16,0.45,50.79,0.71,0.17,0.88,0.51,0.25,0.61,-1.92,-1.65,1.77,0.47,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
> AFGHANISTAN,2018,2.69,7.49,0.50,52.59,0.37,-0.08,0.92,0.42,0.40,0.36,NULL,NULL,1.40,0.52,NULL,NULL,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
> ALGERIA,2018,5.04,9.55,0.79,65.90,0.58,-0.17,0.75,0.59,0.29,NULL,NULL,NULL,1.97,0.39,NULL,0.27,0.66,NULL,NULL,NULL,NULL,0.10,NULL,0.17
> ARGENTINA,2018,5.79,9.80,0.89,68.80,0.84,-0.20,0.85,0.82,0.32,0.26,NULL,NULL,2.47,0.42,NULL,0.46,0.40,NULL,0.27,0.22,0.17,0.15,0.17,0.19
> ARMENIA,2018,5.06,9.11,0.81,66.90,0.80,-0.14,0.67,0.58,0.45,0.67,NULL,NULL,2.10,0.41,NULL,0.31,0.40,NULL,NULL,NULL,0.23,NULL,NULL,0.10
> AZERBAIJAN,2018,5.16,9.67,0.78,65.5,0.77,-0.25,0.56,0.59,0.19,0.83,NULL,NULL,1.86,0.36,NULL,0.21,0.26,NULL,NULL,NULL,0.19,NULL,NULL,0.14
> BANGLADESH,2018,4.49,8.22,0.70,64.30,0.90,-0.03,0.70,0.54,0.36,0.83,NULL,NULL,2.15,0.47,NULL,0.32,0.36,NULL,NULL,NULL,0.20,0.23,NULL,
> BELARUS,2018,5.23,9.77,0.90,66.09,0.64,-0.18,0.71,0.45,0.23,0.42,NULL,NULL,1.69,0.32,NULL,0.28,0.29,NULL,NULL,0.25,0.22,NULL,NULL,0.32
> BELGIUM,2018,6.89,10.67,0.92,72,0.80,-0.12,0.63,0.74,0.25,0.44,NULL,NULL,1.54,0.22,NULL,0.28,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
> BOLIVIA,2018,5.91,8.86,0.82,63.59,0.86,-0.08,0.78,0.74,0.38,0.39,NULL,NULL,2.20,0.37,NULL,0.52,0.45,NULL,NULL,NULL,NULL,NULL,NULL,
> BOTSWANA,2018,3.46,9.68,0.79,58.90,0.81,-0.25,0.80,0.72,0.26,0.71,NULL,NULL,2.63,0.76,NULL,0.62,0.61,NULL,NULL,NULL,NULL,NULL,NULL,
> BRAZIL,2018,6.19,9.55,0.88,66.40,0.75,-0.12,0.76,0.74,0.34,0.16,NULL,NULL,2.63,0.42,NULL,0.54,0.42,NULL,NULL,0.06,NULL,NULL,0.09,0.07
> BULGARIA,2018,5.09,9.87,0.92,66.80,0.72,-0.17,0.95,0.63,0.18,0.21,NULL,NULL,2.03,0.39,NULL,0.35,0.34,NULL,NULL,NULL,0.23,NULL,0.19,
> BURUNDI,2018,3.77,6.54,0.48,53.40,0.64,-0.01,0.59,0.66,0.36,NULL,NULL,NULL,2.81,0.74,NULL,0.36,0.68,NULL,NULL,NULL,NULL,NULL,NULL,
> CHILE,2018,6.43,10.06,0.89,69.90,0.78,-0.07,0.81,0.83,0.27,0.33,NULL,NULL,2.15,0.33,NULL,0.49,0.43,NULL,NULL,0.22,0.21,0.22,0.12,0.12
> CHINA,2018,5.13,9.69,0.78,69.30,0.89,-0.17,NULL,0.85,0.18,NULL,NULL,NULL,2.26,0.44,NULL,0.42,0.53,NULL,NULL,0.59,0.50,0.54,0.49,0.61

Can anyone see what am I doing wrong?

Could I exchange the awk statement to something like:

while read V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ....

do
      if [[ "$V2" = "2018" ]];then

CodePudding user response:

Based on the current intent of your code, this is the modification I can suggest:

#!/usr/bin/env bash

INPUT=./happiness.csv.1
[[ -f $INPUT ]] || { echo "$INPUT file not found"; exit 99; }
awk -F, 'NR == 1 || $2 == "2018" && /-/' "$INPUT"
  • Related