Home > Enterprise >  Python RegEx , how to find words that start with uppercase followed by lower case?
Python RegEx , how to find words that start with uppercase followed by lower case?

Time:11-23

I have the following string

Date: 20/8/2020 Duration: 0.33 IP: 110.1.x.x Server:01

I'm applying findall as a way to split my string when I apply findall it split I & P how can I change expression to get this output

['Date: 20/8/2020 ', 'Duration: 0.33 ', 'IP: 110.1.x.x ', 'Server:01'] 
text = "Date: 20/8/2020 Duration: 0.33 IP: 110.1.x.x Server:01"
my_list = re.findall('[a-zA-Z][^A-Z]*', text)
my_list

['Date: 20/8/2020 ', 'Duration: 0.33 ', 'I', 'P: 110.1.x.x ', 'Server:01']

CodePudding user response:

Look for any string that begins with either two uppercase letters, or an uppercase followed by a lowercase, and then match until you find either the same pattern or end of line.

>>> re.findall(r'([A-Z][a-zA-Z].*?)\s*(?=[A-Z][a-zA-Z]|$)', text)
['Date: 20/8/2020', 'Duration: 0.33', 'IP: 110.1.x.x', 'Server:01']

You may also wish to use this to create a dictionary.

>>> dict(re.split(r'\s*:\s*', m, 1) for m in re.findall(r'([A-Z][a-zA
-Z].*?)\s*(?=[A-Z][a-zA-Z]|$)', text))
{'Date': '20/8/2020', 'Duration': '0.33', 'IP': '110.1.x.x', 'Server': '01'}

CodePudding user response:

With Regex you should always be as precise as possible. So if you know that your input data always looks like that, I would suggest writing the full words in Regex.

If that's not what you want you have to make a sacrifice of certainty:

  1. Change Regex to accept any word containing letters of any size at any position
  2. Add capital P as following letter
  3. Add IP as special case

CodePudding user response:

You can use:

(?<!\S)[A-Z][a-zA-Z]*:\s*\S 

Explanation

  • (?<!\S)
  • [A-Z][a-zA-Z]*: Match an uppercase char A-Z, optional chars a-zA-Z followed by :
  • \s*\S Match optional whitespace chars and 1 non whitespace chars

Regex demo

import re

pattern = r"(?<!\S)[A-Z][a-zA-Z]*:\s*\S "
s = "Date: 20/8/2020 Duration: 0.33 IP: 110.1.x.x Server:01"
print(re.findall(pattern, s))

Output

['Date: 20/8/2020', 'Duration: 0.33', 'IP: 110.1.x.x', 'Server:01']

CodePudding user response:

ok so i got your issue i think you have not imported re module just import following package

import re

just import this and then try to do this

  • Related