Home > front end >  I need to make an awk script to parse text in a file. I am not sure if I am doing it correctly
I need to make an awk script to parse text in a file. I am not sure if I am doing it correctly

Time:10-10

Hi I need to make a an awk script in order to parse a csv file and sort it in bash. I need to get a list of presidents from Wikipedia and sort their years in office by year. When it is all sorted out, each ear needs to be in a text file. Im not sure I am doing it correctly

Here is a portion of my csv file:

28,Woodrow Wilson,http:..en.wikipedia.org.wiki.Woodrow_Wilson,4.03.1913,4.03.1921,Democratic ,WoodrowWilson.gif,thmb_WoodrowWilson.gif,New Jersey
29,Warren G. Harding,http:..en.wikipedia.org.wiki.Warren_G._Harding,4.03.1921,2.8.1923,Republican ,WarrenGHarding.gif,thmb_WarrenGHarding.gif,Ohio 

I want to include $2 which is i think the name, and sort by $4 which is think the date the president took office

Here is my actual awk file:

#!/usr/bin/awk -f
 -F, '{
if (substr($4,length($4)-3,2) == "17")
 { print $2 > Presidents1700 }
else if (substr($4,length($4)-3,2) == "18")
{ print $2 > Presidents1800 }
else if (substr($4,length($4)-3,2) == "19")
{ print $2 > Presidents1900 }
else if (substr($4,length($4)-3,2) == "20")
{ print $2 > Presidents2000 }
}' 

Here is my function running it:

SplitFile() {                                                                              
printf "Task 4: Spliting file based on century\n"                                                                                                            
awk -f $AFILE ${custFolder}/${month}/$DFILE                                                                                                                  
}

Where $AFILE is my awk file, and the directories listed on the right lead to my actual file.

Here is a portion of my output, it's actually several hundred lines long but in the end this is what a portion of it looks like:

awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent ,  Democratic   , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania                                                                                                                                                                    awk: presidentData/10/presidents.csv:47:                ^ syntax error                                                                                                      awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent ,  Democratic   , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania                                                                                                                                                                    awk: presidentData/10/presidents.csv:47:                                                                  ^ syntax error                                                    
awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent ,  Democratic   , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania                                                                                                                                                                    awk: presidentData/10/presidents.csv:47:                                                                                             ^ syntax error                         
awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent ,  Democratic   , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania                                                                                                                                                                    awk: presidentData/10/presidents.csv:47:  

I know the output is not very helpful; I would rather just screenshot but I can't. I tried getting help but these online classes can be really hard and getting help at a distance is tough, the syntax errors above seem to be pointing to commas in the csv file.

CodePudding user response:

After the edits, it's clear you are trying to classify the presidents by century outputting the century in which the president served.

As stated in my comments above, you don't include single quotes or command-line arguments in an awk script file. You use the BEGIN {...} rule to set the field-separator FS = ",". Then there are several ways to you split things in the fourth field. split() is just as easy as anything else.

That will leave you with the ending year in which the president served in the fourth element of arr (arr[0] is always the complete expression matching any REGEX used). Then it just a matter of comparing with the largest year first and decreasing from there redirecting the output to the output file for the century.

Continuing with what you started, your awk script will look similar to:

#!/usr/bin/awk -f

BEGIN { FS = "," }

{ 
  split ($4, arr, ".")
  if (arr[3] >= 2000)
    print $2 > "Presidents2000"
  else if (arr[3] >= 1900)
    print $2 > "Presidents1900"
  else if (arr[3] >= 1800)
    print $2 > "Presidents1800"
  else if (arr[3] >= 1700)
    print $2 > "Presidents1700"
}

Now make it executable (for convenience). Presuming the script is in the file pres.awk:

$ chmod  x pres.awk

Now simply call the awk script passing the .csv filename as the argument, e.g.

$ ./pres.awk my.csv

Now list the files named Presid* and see what is created:

$ ls -al Presid*
-rw-r--r-- 1 david david 33 Oct  8 22:28 Presidents1900

And verify the contents is what you needed:

$ cat Presidents1900
Woodrow Wilson
Warren G. Harding

Presuming that is the output you are looking for based on your attempt.

(note: you need to quote the output file name to ensure, e.g. Presidents1900 isn't taken as a variable that hasn't been set yet)

Let me know if you have further questions.

  • Related