How to apply sed to a grep result in a bash script?-CodePudding

I want to modify a csv file with some sed commands but only on the rows that match certain regex.

I have a grep command that works fine in a script:

#!/usr/bin/bash   
   
egrep  '^[A-Z][a-z]*,2018' happiness.csv

And the desired sed commands that work correctly:

#!/usr/bin/bash   
       
sed  -re '

 s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
 s/[a-z]/\U&/g
 s/([0-9] \.[0-9]{2})[0-9] /\1/g
  
' happiness.csv

When I combine them together in one script the grep command gets omitted and the script runs only the sed commands:

#!/usr/bin/bash   
   
egrep  '^[A-Z][a-z]*,2018' happiness.csv

sed  -re '

 s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
 s/[a-z]/\U&/g
 s/([0-9] \.[0-9]{2})[0-9] /\1/g
  
' happiness.csv

Can anyone help?

Sample data:

Country name,Year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect,Confidence in national government,Democratic Quality,Delivery Quality,Standard deviation of ladder by country-year,Standard deviation/Mean of ladder by country-year,GINI index (World Bank estimate),"GINI index (World Bank estimate), average 2000-16","gini of household income reported in Gallup, by wp5-year","Most people can be trusted, Gallup","Most people can be trusted, WVS round 1981-1984","Most people can be trusted, WVS round 1989-1993","Most people can be trusted, WVS round 1994-1998","Most people can be trusted, WVS round 1999-2004","Most people can be trusted, WVS round 2005-2009","Most people can be trusted, WVS round 2010-2014"
Afghanistan,2008,3.723589897,7.168690205,0.450662315,50.79999924,0.718114316,0.177888572,0.88168633,0.517637193,0.25819549,0.61207211,-1.929689646,-1.655084372,1.774661899,0.476599723,,,,,,,,,,
Afghanistan,2009,4.401778221,7.333789825,0.55230844,51.20000076,0.678896368,0.200178429,0.850035429,0.583925605,0.23709242,0.611545205,-2.044092655,-1.635024786,1.722687602,0.391361743,,,0.441905767,0.286315262,,,,,,
Afghanistan,2018,4.75838089,7.386628628,0.539075196,51.59999847,0.60012722,0.13435255,0.706766069,0.61826545,0.275323808,0.299357414,-1.991810083,-1.617176056,1.878621817,0.394802749,,,0.327318162,0.275832713,,,,,,
Afghanistan,2011,3.83171916,7.415018559,0.521103561,51.91999817,0.495901406,0.172136664,0.731108546,0.611387312,0.267174691,0.307385713,-1.919018269,-1.616221189,1.78535974,0.465942234,,,0.336764246,,,,,,,
Afghanistan,2012,3.782937527,7.517126083,0.520636737,52.24000168,0.530935049,0.244272724,0.775619805,0.710384727,0.267919123,0.435440153,-1.842995763,-1.40407753,1.798283219,0.47536689,,,0.344539613,,,,,,,
Afghanistan,2013,3.572100401,7.522237778,0.48355186,52.56000137,0.577955365,0.070402659,0.8232041,0.620584846,0.273328096,0.482847273,-1.879708767,-1.403035522,1.223689914,0.342568725,,,0.304368466,,,,,,,
Afghanistan,2014,3.130895615,7.516955376,0.525568426,52.88000107,0.508514047,0.113184482,0.871241987,0.531691492,0.374860734,0.409047514,-1.773256779,-1.312502503,1.395396113,0.445685923,,,0.413973927,,,,,,,
Afghanistan,2015,3.982854605,7.500538826,0.528597236,53.20000076,0.388927579,0.089090675,0.880638301,0.553553164,0.339276046,0.260557145,-1.84436357,-1.29159379,2.16061759,0.542479634,,,0.59691757,,,,,,,
Albania,2018,4.220168591,7.497038364,0.559071779,53,0.522566199,0.051364917,0.793245554,0.564952672,0.348332286,0.324989557,-1.855426311,-1.392712831,1.796219468,0.42562741,,,0.418629497,,,,,,,

Desired output:

COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-1984","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-1993","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1994-1998","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-2004","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-2009","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-2014" 

AFGHANISTAN,2018,2.69,7.49,-0.50,52.59,0.37,-0.08,0.92,0.42,0.40,0.36,NULL,NULL,1.40,0.52,NULL,NULL,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
ALBANIA,2018,4.63,9.07,-0.82,65.80,0.52,-0.01,0.87,0.55,0.24,0.30,-0.04,-0.42,1.76,0.38,NULL,0.30,NULL,NULL,NULL,NULL,0.24,0.23,NULL,       ARGENTINA,2018,5.48,9.16,-0.83,66.19,0.52,-0.16,0.86,0.64,0.27,NULL,0.04,-0.26,1.91,0.34,NULL,0.30,0.61,0.11,NULL,NULL,0.24,0.23,NULL,

CodePudding user response：

You can search using same regex that you have in egrep and make sure to group all substitution commands:

sed -nE '1p; /^[A-Z][a-z]*,2018/ {
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z] /\U&/g
s/([0-9] \.[0-9]{2})[0-9] /\1/gp
}' happiness.csv

AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL

CodePudding user response：

I'm not a bash pro but this should work:

#!/usr/bin/bash   

grep_res=$(egrep  '^[Aa] .*,2018' happiness.csv)

echo "$grep_res" | sed  -re '
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z]/\U&/g
s/([0-9] \.[0-9]{2})[0-9] /\1/g
'

What it does is it saves output of the grep at grep_res variable and than feeds sed command with it.

CodePudding user response：

Here is the same solution with standard Linux awk (gawk) script.

Includes dealing with first line.

script.awk

{ $0 = toupper($0);}   #Upper case each incoming line
/^[A-Z]*,2018/ || NR == 1 {    # deal with first line or matching with /^[A-Z]*,2018/
  $0 = gensub(/([,])([,]|$)/, "\\1NULL\\2", "g", $0); # replace ,, with ,NULL,
  $0 = gensub(/([,])([,]|$)/, "\\1NULL\\2", "g", $0); # replace remaining ,, with ,NULL,
  $0 = gensub(/([0-9] .[0-9])([0-9])([0-9])*/, "\\1\\2", "g", $0); # trim decimal point numbers
  print $0; # print output line
}

running

 awk -f script.awk happiness.csv

output

$ awk -f script.awk input.csv
COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-19","MOST PEOPLE CAN BE TRUSTED, 
WVS ROUND 1994-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-20"
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL