I'm trying to write python script where I open/read a log file and pull out specific information such as the below:
Mac address, IP address, Total number of acks
To pull out the information I know I will need a regex and my current regex is below
"(([0-9a-f]{2}:){5}[0-9a-f]{2}) via (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
I'm still working on pulling the ACK aka DHCPACK requests only. But what I'm having a hard time with is pulling the IP address. A snippet of the log I have to read/extract is below:
Mar 15 09:17:29 linux1 dhcpd: uid lease 10.119.127.14 for client 2c:44:fd:02:02:3b is duplicate on 10.119.0/17
Mar 15 09:17:29 linux1 dhcpd: DHCPREQUEST for 10.119.79.13 (10.1.2.27) from 2c:44:fd:02:02:3b via 10.119.0.1
**Mar 15 09:17:29 linux1 dhcpd: DHCPACK on 10.119.79.13 to 2c:44:fd:02:02:3b via 10.119.0.1**
Mar 15 09:17:29 linux1 dhcpd: DHCPDECLINE of 10.119.79.13 from 2c:44:fd:02:02:3b via 10.119.0.1: not found
Mar 15 09:17:30 linux1 dhcpd: DHCPREQUEST for 10.172.219.117 from 84:8e:0c:1d:58:60 (iPhone) via 10.172.192.1
**Mar 15 09:17:30 linux1 dhcpd: DHCPACK on 10.172.219.117 to 84:8e:0c:1d:58:60 (iPhone) via 10.172.192.1**
Mar 15 09:17:30 linux1 dhcpd: DHCPDISCOVER from 9c:b7:0d:8a:29:65 via 10.145.192.1: network 10.145.192/19: no free leases
Mar 15 09:17:31 linux1 dhcpd: DHCPREQUEST for 10.119.222.25 from c8:f6:50:d6:ce:be (63074) via 10.119.192.1
Mar 15 09:17:31 linux1 dhcpd: DHCPACK on 10.119.222.25 to c8:f6:50:d6:ce:be (63074) via 10.119.192.1
I can pull the first ACK just fine but they don't all have the same format/pattern. For example, the first bolded line is 'MAC via IP' and the second bolded is 'MAC (iphone) via IP'.
I'm new to regex's so not sure if there is a way to search for both patterns or if I simply need to make two separate regexs for each pattern?
CodePudding user response:
Perhaps this can give you a start.
txt = """\
Mar 15 09:17:29 linux1 dhcpd: uid lease 10.119.127.14 for client 2c:44:fd:02:02:3b is duplicate on 10.119.0/17
Mar 15 09:17:29 linux1 dhcpd: DHCPREQUEST for 10.119.79.13 (10.1.2.27) from 2c:44:fd:02:02:3b via 10.119.0.1
Mar 15 09:17:29 linux1 dhcpd: DHCPACK on 10.119.79.13 to 2c:44:fd:02:02:3b via 10.119.0.1
Mar 15 09:17:29 linux1 dhcpd: DHCPDECLINE of 10.119.79.13 from 2c:44:fd:02:02:3b via 10.119.0.1: not found
Mar 15 09:17:30 linux1 dhcpd: DHCPREQUEST for 10.172.219.117 from 84:8e:0c:1d:58:60 (iPhone) via 10.172.192.1
Mar 15 09:17:30 linux1 dhcpd: DHCPACK on 10.172.219.117 to 84:8e:0c:1d:58:60 (iPhone) via 10.172.192.1
Mar 15 09:17:30 linux1 dhcpd: DHCPDISCOVER from 9c:b7:0d:8a:29:65 via 10.145.192.1: network 10.145.192/19: no free leases
Mar 15 09:17:31 linux1 dhcpd: DHCPREQUEST for 10.119.222.25 from c8:f6:50:d6:ce:be (63074) via 10.119.192.1
Mar 15 09:17:31 linux1 dhcpd: DHCPACK on 10.119.222.25 to c8:f6:50:d6:ce:be (63074) via 10.119.192.1"""
import re
for ln in txt.splitlines():
if not 'DHCPACK' in ln:
continue
parts = re.findall( r"(\d \.\d \.\d \.\d )|(([\da-f] :){5}[\da-f] )", ln )
print(parts[0][0], "---", parts[1][1])
Output:
10.119.79.13 --- 2c:44:fd:02:02:3b
10.172.219.117 --- 84:8e:0c:1d:58:60
10.119.222.25 --- c8:f6:50:d6:ce:be
CodePudding user response:
You can use this regex pattern :
(([0-9a-f]{2}:){5}[0-9a-f]{2})( \(\w \))? via (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
It matches with all cases where it might have some extra words in brackets in between the mac address and the via
word and that part is made optional.
For the input text provided, the regex matches are :
2c:44:fd:02:02:3b via 10.119.0.1
2c:44:fd:02:02:3b via 10.119.0.1
2c:44:fd:02:02:3b via 10.119.0.1
84:8e:0c:1d:58:60 (iPhone) via 10.172.192.1
84:8e:0c:1d:58:60 (iPhone) via 10.172.192.1
9c:b7:0d:8a:29:65 via 10.145.192.1
c8:f6:50:d6:ce:be (63074) via 10.119.192.1
c8:f6:50:d6:ce:be (63074) via 10.119.192.1
For better visualization, check the matches here.