I want to modify a csv file with some sed commands but only on the rows that match certain regex.
I have a grep command that works fine in a script:
#!/usr/bin/bash
egrep '^[A-Z][a-z]*,2018' happiness.csv
And the desired sed commands that work correctly:
#!/usr/bin/bash
sed -re '
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z]/\U&/g
s/([0-9] \.[0-9]{2})[0-9] /\1/g
' happiness.csv
When I combine them together in one script the grep command gets omitted and the script runs only the sed commands:
#!/usr/bin/bash
egrep '^[A-Z][a-z]*,2018' happiness.csv
sed -re '
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z]/\U&/g
s/([0-9] \.[0-9]{2})[0-9] /\1/g
' happiness.csv
Can anyone help?
Sample data:
Country name,Year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect,Confidence in national government,Democratic Quality,Delivery Quality,Standard deviation of ladder by country-year,Standard deviation/Mean of ladder by country-year,GINI index (World Bank estimate),"GINI index (World Bank estimate), average 2000-16","gini of household income reported in Gallup, by wp5-year","Most people can be trusted, Gallup","Most people can be trusted, WVS round 1981-1984","Most people can be trusted, WVS round 1989-1993","Most people can be trusted, WVS round 1994-1998","Most people can be trusted, WVS round 1999-2004","Most people can be trusted, WVS round 2005-2009","Most people can be trusted, WVS round 2010-2014"
Afghanistan,2008,3.723589897,7.168690205,0.450662315,50.79999924,0.718114316,0.177888572,0.88168633,0.517637193,0.25819549,0.61207211,-1.929689646,-1.655084372,1.774661899,0.476599723,,,,,,,,,,
Afghanistan,2009,4.401778221,7.333789825,0.55230844,51.20000076,0.678896368,0.200178429,0.850035429,0.583925605,0.23709242,0.611545205,-2.044092655,-1.635024786,1.722687602,0.391361743,,,0.441905767,0.286315262,,,,,,
Afghanistan,2018,4.75838089,7.386628628,0.539075196,51.59999847,0.60012722,0.13435255,0.706766069,0.61826545,0.275323808,0.299357414,-1.991810083,-1.617176056,1.878621817,0.394802749,,,0.327318162,0.275832713,,,,,,
Afghanistan,2011,3.83171916,7.415018559,0.521103561,51.91999817,0.495901406,0.172136664,0.731108546,0.611387312,0.267174691,0.307385713,-1.919018269,-1.616221189,1.78535974,0.465942234,,,0.336764246,,,,,,,
Afghanistan,2012,3.782937527,7.517126083,0.520636737,52.24000168,0.530935049,0.244272724,0.775619805,0.710384727,0.267919123,0.435440153,-1.842995763,-1.40407753,1.798283219,0.47536689,,,0.344539613,,,,,,,
Afghanistan,2013,3.572100401,7.522237778,0.48355186,52.56000137,0.577955365,0.070402659,0.8232041,0.620584846,0.273328096,0.482847273,-1.879708767,-1.403035522,1.223689914,0.342568725,,,0.304368466,,,,,,,
Afghanistan,2014,3.130895615,7.516955376,0.525568426,52.88000107,0.508514047,0.113184482,0.871241987,0.531691492,0.374860734,0.409047514,-1.773256779,-1.312502503,1.395396113,0.445685923,,,0.413973927,,,,,,,
Afghanistan,2015,3.982854605,7.500538826,0.528597236,53.20000076,0.388927579,0.089090675,0.880638301,0.553553164,0.339276046,0.260557145,-1.84436357,-1.29159379,2.16061759,0.542479634,,,0.59691757,,,,,,,
Albania,2018,4.220168591,7.497038364,0.559071779,53,0.522566199,0.051364917,0.793245554,0.564952672,0.348332286,0.324989557,-1.855426311,-1.392712831,1.796219468,0.42562741,,,0.418629497,,,,,,,
Desired output:
COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-1984","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-1993","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1994-1998","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-2004","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-2009","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-2014"
AFGHANISTAN,2018,2.69,7.49,-0.50,52.59,0.37,-0.08,0.92,0.42,0.40,0.36,NULL,NULL,1.40,0.52,NULL,NULL,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
ALBANIA,2018,4.63,9.07,-0.82,65.80,0.52,-0.01,0.87,0.55,0.24,0.30,-0.04,-0.42,1.76,0.38,NULL,0.30,NULL,NULL,NULL,NULL,0.24,0.23,NULL, ARGENTINA,2018,5.48,9.16,-0.83,66.19,0.52,-0.16,0.86,0.64,0.27,NULL,0.04,-0.26,1.91,0.34,NULL,0.30,0.61,0.11,NULL,NULL,0.24,0.23,NULL,
CodePudding user response:
You can search using same regex that you have in egrep
and make sure to group all substitution commands:
sed -nE '1p; /^[A-Z][a-z]*,2018/ {
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z] /\U&/g
s/([0-9] \.[0-9]{2})[0-9] /\1/gp
}' happiness.csv
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL
CodePudding user response:
I'm not a bash pro but this should work:
#!/usr/bin/bash
grep_res=$(egrep '^[Aa] .*,2018' happiness.csv)
echo "$grep_res" | sed -re '
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z]/\U&/g
s/([0-9] \.[0-9]{2})[0-9] /\1/g
'
What it does is it saves output of the grep at grep_res
variable and than feeds sed command with it.
CodePudding user response:
Here is the same solution with standard Linux awk
(gawk) script.
Includes dealing with first line.
script.awk
{ $0 = toupper($0);} #Upper case each incoming line
/^[A-Z]*,2018/ || NR == 1 { # deal with first line or matching with /^[A-Z]*,2018/
$0 = gensub(/([,])([,]|$)/, "\\1NULL\\2", "g", $0); # replace ,, with ,NULL,
$0 = gensub(/([,])([,]|$)/, "\\1NULL\\2", "g", $0); # replace remaining ,, with ,NULL,
$0 = gensub(/([0-9] .[0-9])([0-9])([0-9])*/, "\\1\\2", "g", $0); # trim decimal point numbers
print $0; # print output line
}
running
awk -f script.awk happiness.csv
output
$ awk -f script.awk input.csv
COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-19","MOST PEOPLE CAN BE TRUSTED,
WVS ROUND 1994-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-20"
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL