Home > Software design >  Awk loop select only first word from the third column
Awk loop select only first word from the third column

Time:05-05

I need to create a new file using awk script modifying the column "name" deleting the surnames. It must necessarily be made with a while or for.

Original csv:

id,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
3,Tim Elliot,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
4,Lewis Lee Lembke,2015-01-02,shot,gun,47,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
8,Matthew Hoffman,2015-01-04,shot,toy weapon,32,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True

The expected output:

id,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
3,Tim,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
4,Lewis,2015-01-02,shot,gun,47,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
8,Matthew,2015-01-04,shot,toy weapon,32,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True

CodePudding user response:

I would harness GNU AWK for this task following way in order to comply with must necessarily be made with a while(...) requirement, let file.txt content be

id,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
3,Tim Elliot,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
4,Lewis Lee Lembke,2015-01-02,shot,gun,47,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
8,Matthew Hoffman,2015-01-04,shot,toy weapon,32,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True

then

awk 'BEGIN{FS=OFS=","}{while(sub(/ [[:alpha:]] $/,"",$2)){}}{print}' file.txt

output

id,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
3,Tim,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
4,Lewis,2015-01-02,shot,gun,47,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
8,Matthew,2015-01-04,shot,toy weapon,32,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True

Explanation: firstly I inform GNU AWK that both field separator (FS) and output field separator (OFS) is ,. Then I use while statement to remove space followed by zero or more ( ) letters ([[:alpha:]]) which are immediately before end of string from 2nd field by replacing it with empty string. sub String function does alter provided variable in this case 2nd field ($2) and return 1 if change was done 0 otherwise therefore while will terminate when change is not possible. After ending while I do print changed line.

(tested in gawk 4.2.1)

CodePudding user response:

Here we can choose to print all the fields except $2 as there are. We split $2 and print the first element.

echo "3,Tim Elliot,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True"| awk -F"[,]" '{for(i=1; i<=NF; i  ) if( i == 2 ) {split($i,a," ");printf a[1] ","} else { printf $i "," ;};}  '

output

3,Tim,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True,
  • Related