Home > Back-end >  Count IP on URLs begins with "domain/product" in Apache access_logs
Count IP on URLs begins with "domain/product" in Apache access_logs

Time:01-01

I try to count the access on a specific URL which begins every time with "shop/product?traffic=ads" with AWK, but I failed.

The following code gives me a counter how often an IP address has accessed these URL:

awk -F'[ "] ' '$7 == "/shop/product?traffic=ads" { ipcount[$1]   }
END { for (i in ipcount) {
    printf "s - %d\n", i, ipcount[i] } }' /var/www/vhosts/domain.com/logs/access_ssl_log

An example for the access_log (input-file) is here:

66.249.68.xx- - [19/Dec/2022:09:14:15  0100] "GET /shop/other-product/1.0" 404 16996 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.xxx Mobile Safari/537.36 (compatible; Googlebot/2.1;  http://www.google.com/bot.html)"
109.42.242.xxx - - [19/Dec/2022:09:14:55  0100] "GET /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"
80.187.75.xx - - [20/Dec/2022:06:40:12  0100] "GET /shop/product HTTP/1.0" 200 10821 "https://www.example.com/shop/product?traffic=ads&gclid=EAIaIQobChMIg_Ks5vWF_AIVAgGLCh3k_gBKEAAYASAAEgKBOfD_BwE&dt=1671461107791" "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1"

The "gclid" and and the "dt"(session cookie) are dynamic.

I try to play with ^ after ads, before /shop, but there will be no results.

I want for example the following output:

6 Clicks from 109.42.242.xxx to /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB
1 Clicks from 80.187.75.xx to https://www.example.com/shop/product?traffic=ads&gclid=EAIaIQobChMIg_Ks5vWF_AIVAgGLCh3k_gBKEAAYASAAEgKBOfD_BwE&dt=1671461107791"

CodePudding user response:

You can check if the string occurs in field 7 using index(), and then store the values of field 1 and field 7 with a space in between as the key, to retrieve both values in the END block by splitting on a space again.

awk -F'[ "] ' 'index($7,  "/shop/product?traffic=ads") { ipcount[$1 " " $7]   }

END { for (i in ipcount) {
    parts = split(i, a, " ")
    printf ipcount[i] " Clicks from " a[1] " to " a[2] "\n"
  }
}' file

Test data

66.249.68.xx- - [19/Dec/2022:09:14:15  0100] "GET /shop/other-product/1.0" 404 16996 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.xxx Mobile Safari/537.36 (compatible; Googlebot/2.1;  http://www.google.com/bot.html)"
109.42.242.xxx - - [19/Dec/2022:09:14:55  0100] "GET /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"
109.42.242.xxx - - [19/Dec/2022:09:15:55  0100] "GET /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"
80.187.75.xx - - [20/Dec/2022:06:40:12  0100] "GET /shop/product HTTP/1.0" 200 10821 "https://www.example.com/shop/product?traffic=ads&gclid=EAIaIQobChMIg_Ks5vWF_AIVAgGLCh3k_gBKEAAYASAAEgKBOfD_BwE&dt=1671461107791" "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1"
109.42.242.xxx - - [19/Dec/2022:09:15:55  0100] "GET /shop/product?traffic=ads&gclid=Aj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"

Output

1 Clicks from 109.42.242.xxx to /shop/product?traffic=ads&gclid=Aj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB
2 Clicks from 109.42.242.xxx to /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB

CodePudding user response:

With your shown samples please try following awk code. Using match function to match regex \/shop\/product\?traffic=ads\S (where escaped / to match literal /) and if match is found then creating an array value with index of $1 FS and matched value. In the END block of this program printing the values as per requirement.

awk '
match($7,/\/shop\/product\?traffic=ads\S /){
  value[$1 FS substr($7,RSTART,RLENGTH)]  
}
END{
  for(i in value){
    split(i,arr)
    print value[i] " Clicks from " arr[1]  " to " arr[2]
  }
}
'  Input_file
  • Related