There is a regex I use to collect all the names in a long text file (Multi line):
regex = 'Name:\s*(.*)$'
names = re.findall(regex, file_content)
The file contains several sections, and I need to collect names only up to a specific substring (for example, "computers:"). It is possible to do this with Python (e.g., cut the file_content
after the substring), but for some reason, I must use regex only.
How?
Example for the text file:
Name: Jon
address: 1st
phone: 01321231231231
Name: Mon
address: 1st
phone: 01321231231231
Name: Gon
address: 1st
phone: 01321231231231
Computers:
Name: Jason
address: 1st
phone: 01321231231231
Name: Bason
address: 1st
phone: 01321231231231
Output: Jon, Mon, Gon
CodePudding user response:
You can use
regex = 'Name:\s*(.*)(?=[\s\S]*computers:)'
Here,
Name:
- a fixed string\s*
- zero or more whitespace(.*)
- Group 1: any zero or more chars other than line break chars as many as possible(?=[\s\S]*computers:)
- immediately to the right, there must be any zero or more chars followed withcomputers:
string
CodePudding user response:
import re
file_content = """ Name: Jon address: 1st phone: 01321231231231 Name: Mon address: 1st phone: 01321231231231 Name: Gon address: 1st phone: 01321231231231
Computers:
Name: Jason address: 1st phone: 01321231231231 Name: Bason address: 1st phone: 01321231231231 """
names = re.findall(r'Name:.*\n', file_content)
final_name_list = []
for name in names: final_name_list.append(name.replace("Name: ", "").replace("\n", ""))
print(final_name_list)