Home > Software engineering >  How can I use parenthesis while using the (?P<name>...) regular expression?
How can I use parenthesis while using the (?P<name>...) regular expression?

Time:10-21

I want to specify something from the expression after the (?P<name>...) like:

(?P<name>\d. ([a-z]))

but it is not allowing me. Is there a way to use the (?P<name>...) expression with parenthesis? Sample text:

197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web services HTTP/2.0" 203 26554
156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048
168.95.156.240 - stark2413 [21/Jun/2019:15:45:31 -0700] "GET /engage HTTP/2.0" 201 9645
71.172.239.195 - dooley1853 [21/Jun/2019:15:45:32 -0700] "PUT /cutting-edge HTTP/2.0" 406 24498
180.95.121.94 - mohr6893 [21/Jun/2019:15:45:34 -0700] "PATCH /extensible/reinvent HTTP/1.1" 201 27330
144.23.247.108 - auer7552 [21/Jun/2019:15:45:35 -0700] "POST /extensible/infrastructures/one-to-one/enterprise HTTP/1.1" 100 

Let's say I only want the date part

z=re.finditer('(?P<date>\[\d . \])',logdata)
        for l in z:
         print(l.groupdict()) 

this allows me to get the date but with the square brackets, how can I specify that I don't want the square brackets with out using the brackets?? because if I use the brackets an error pops out

z=re.finditer('(?P<date>\[(\d . )\])',logdata)
        for l in z:
         print(l.groupdict())

it just returns the date with square brackets and ignores my parenthesis here

import re
def logs():
    with open("assets/logdata.txt", "r") as file:
        logdata = file.read()
        k=re.finditer('(?P<request>\". \")',logdata)
        for request in k:
            a=request.groupdict()
        q=re.finditer('(?P<time>\d /. \s-\d )',logdata)
        for time in q:
            b=time.groupdict()
        g=re.finditer('(?P<host>\d \.\d \.\d \.\d )',logdata)
        for host in g:
            c=host.groupdict()
        f= re.finditer('(?P<user_name>\s[a-z] \d )',logdata)
        for user_name in f:
            d=user_name.groupdict()
        d.update(a)
        d.update(b)
        d.update(c)
        return d

that's the code for the entire assignment.

CodePudding user response:

use pattern (.*?) to match anything inside brackets.

import re

regex = r'\[(.*?)\]'

text = '''197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web services HTTP/2.0" 203 26554
156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048
168.95.156.240 - stark2413 [21/Jun/2019:15:45:31 -0700] "GET /engage HTTP/2.0" 201 9645
71.172.239.195 - dooley1853 [21/Jun/2019:15:45:32 -0700] "PUT /cutting-edge HTTP/2.0" 406 24498
180.95.121.94 - mohr6893 [21/Jun/2019:15:45:34 -0700] "PATCH /extensible/reinvent HTTP/1.1" 201 27330
144.23.247.108 - auer7552 [21/Jun/2019:15:45:35 -0700] "POST /extensible/infrastructures/one-to-one/enterprise HTTP/1.1" 100'''

#print(re.findall(regex, text))

test = []
data = re.finditer(regex, text)
for i in data:
    test.append(i.group(1))
print(test)

>>> ['21/Jun/2019:15:45:25 -0700', '21/Jun/2019:15:45:27 -0700', '21/Jun/2019:15:45:28 -0700', '21/Jun/2019:15:45:31 -0700', '21/Jun/2019:15:45:32 -0700', '21/Jun/2019:15:45:34 -0700', '21/Jun/2019:15:45:35 -0700']
  • Related