I have access.log file with >1m lines. The exaple of line:
113.10.154.38 - - [27/May/2016:03:36:26 0200] "POST /index.php?option=com_jce&task=plugin&plugin=imgmanager&file=imgmanager&method=form&cid=20&6bc427c8a7981f4fe1f5ac65c1246b5f=cf6dd3cf1923c950586d0dd595c8e20b HTTP/1.1" 200 22 "-" "BOT/0.1 (BOT for JCE)" "-"
I need to parse log lines to count 10 most common urls, BUT i need to remove query params from url. Without query params i wrote this code
awk '{print $7}' test.log | sort | uniq -c | sort -rn | \
head | awk '{print NR,"\b. URL:", $2,"\n Requests:", $1}'
But i don't know how to remove query params and count top 10 most common urls without params to get clear top of requests.
CodePudding user response:
Use the sub()
function to remove a pattern from a string.
You also need to do this when you're extracting the field to sort and count unique values.
awk '{sub(/\?.*/, "", $7); print $7}' test.log | sort | uniq -c | sort -rn | ...