Home > Software design >  A Regex for looking after pattern (many occurs) but only till a substring
A Regex for looking after pattern (many occurs) but only till a substring

Time:12-16

There is a regex I use to collect all the names in a long text file (Multi line):

regex = 'Name:\s*(.*)$'
names = re.findall(regex, file_content)

The file contains several sections, and I need to collect names only up to a specific substring (for example, "computers:"). It is possible to do this with Python (e.g., cut the file_content after the substring), but for some reason, I must use regex only.

How?

Example for the text file:

Name:     Jon
  address: 1st 
  phone: 01321231231231
Name:     Mon
  address: 1st 
  phone: 01321231231231
Name:     Gon
  address: 1st 
  phone: 01321231231231

Computers:

Name:     Jason
  address: 1st 
  phone: 01321231231231
Name:     Bason
  address: 1st 
  phone: 01321231231231

Output: Jon, Mon, Gon

CodePudding user response:

You can use

regex = 'Name:\s*(.*)(?=[\s\S]*computers:)'

Here,

  • Name: - a fixed string
  • \s* - zero or more whitespace
  • (.*) - Group 1: any zero or more chars other than line break chars as many as possible
  • (?=[\s\S]*computers:) - immediately to the right, there must be any zero or more chars followed with computers: string

CodePudding user response:

import re

file_content = """ Name: Jon address: 1st phone: 01321231231231 Name: Mon address: 1st phone: 01321231231231 Name: Gon address: 1st phone: 01321231231231

Computers:

Name: Jason address: 1st phone: 01321231231231 Name: Bason address: 1st phone: 01321231231231 """

names = re.findall(r'Name:.*\n', file_content)

final_name_list = []

for name in names: final_name_list.append(name.replace("Name: ", "").replace("\n", ""))

print(final_name_list)

  • Related