The awk script should produce a table that lists words with more than 4 letters and more than 2 occurrences. The last line of the output should display the number of words in the file.
Input file
In navigation, the heading of a vessel or aircraft is the compass direction
in which the craft's bow or nose is pointed. Note that the heading may not
#necessarily be the direction that the vehicle actually travels, which is
known as its course or track. Any difference between the heading and
course is due to the motion of the underlying medium, the air or water,
or other effects like skidding or slipping. The difference is known as
the drift, and can be determined by the wind triangle. At least seven ways
to measure the heading of a vehicle have been described. A compass installed
in a vehicle or vessel has a certain amount of error caused by the magnetic
properties of the vessel. This error is known as compass deviation. The
magnitude of the compass deviation varies greatly depending upon the local
anomalies created by the vessel. A fiberglass recreational vessel will
#generally have much less compass deviation than a steel-hulled vessel.
Electrical wires carrying current have a small magnetic field around them
and can cause deviation. Any type of magnet, such as found in a speaker
can also cause large magnitudes of compass deviation. The error can be
corrected using a deviation table. Deviation tables are very difficult to
create. Once a deviation table is established, it is only good for that
particular vessel, with that particular configuration. If electrical wires
are moved or anything else magnetic (speakers, electric motors, etc.) are
moved, the deviation table will change. All deviations in the deviation
table are indicated west or east. If the compass is pointing west of the
Magnetic North Pole, then the deviation is westward. If the compass is
pointing east of the Magnetic North Pole, then the deviation is eastward.
Write an awk script that produces a report from an input file. The report counts the number of times a word occurs in the input file.**
Separately implement the additional functionality that ignores punctuation characters such as
".,;:()".*
cat input.txt|awk -F" "'{for(i=1;i<=NF;i ) a[$i] } END {for(k in a) print k,a[k]}'
outcome I expect enter image description here
CodePudding user response:
NOTE: this looks (to me) like a homework assignment so I'm just going to address current code.
awk '{ for (i=1;i<=NF;i ) a[$i] } # remains the same
END { for (k in a) {
total =a[k] # keep track of total word count
if ( a[k] > 2 && length(k) > 4 ) # apply filters to limit output
print k,a[k]
}
printf "\nTotal: %s\n", total
}
' input.txt
This generates:
compass 8
deviation 8 # 1 for 'Deviation' ?
deviation. 3 # need to strip "."
error 3
heading 4
known 3
magnetic 3 # 2 for 'Magnetic' ?
table 3 # 1 for 'table.' ?
vehicle 3
vessel 3 # 1 for 'vessel,' ?
vessel. 3 # need to strip "."
# 2 for 'moved' & 'moved.' ?
# 2 for 'electrical' & 'Electrical' ?
Total: 292
NOTES:
- no sorting requirements have been provided (I piped through
sort -f
to generate this output so it's easier to see where potential issues need to be addressed) - assuming OP's desired output (in the image) is a subset of what's expected (eg,
vehicle
does not show up in OP's output; my script foundcompass 8
while OP's (image) showscompass 7
) - assuming
number of words in file
applies to ALL words and not just those oflength > 4
andcount > 2
(I found292
while OP's output shows271
); otherwise OP will need to add logic to determine ifa[$i]
should be performed - OP needs to implement additional logic to strip out punctuation before performing the
a[$i]
(hint:gsub()
orgensub()
functions) - does OP need to implement case-insensitive storage of words for counting purposes? if so this will increase the number of hits for some words (eg,
magnetic
anddeviation
) (hint:tolower()
function)
CodePudding user response:
$ cat awk.script
#!/usr/bin/env awk -f
BEGIN {
print "\tWord Count\n--------------------"
} /vessel/ {
vessel ;next
} /compass/ {
compass
} /known/ {
known
} /table/ {
table
} /heading/ {
heading
} /magnetic/ {
mag
} /deviation/ {
dev
} /error/ {
error
} END {
print "\tvessel "vessel "\n\tcompass "compass "\n\tknown "known "\n\ttable "table "\n\theading "heading "\n\tmagnetic "mag "\n\tdeviation "dev "\n\terror "error "\n--------------------\nNumber of words: " total
} ; {
total=vessel compass known table heading mag dev error
}
$ awk -f awk.script RS=" " input_file
Word Count
--------------------
vessel 7
compass 8
known 3
table 5
heading 4
magnetic 3
deviation 12
error 3
--------------------
Number of words: 45