I have a text file containing random strings. I want to use specific criterias to extract the strings that match these criterias.
Example text :
B311-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293
Example criteria :
All the strings that contains characters seperated by hyphens this way : XXX-XX-XXXX
Output : 'B311-SG-1700'
I tried creating a function but I can't seem to know how to use criterias for string specifically and how to apply them.
CodePudding user response:
You can use re
module to extract the pattern from text:
import re
text = """\
B311-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293
BAKSJD873-JAN-1293 B312-SG-1700-ASJND83-ANSDN762"""
for m in re.findall(r"\b.{4}-.{2}-.{4}", text):
print(m)
Prints:
B311-SG-1700
B312-SG-1700
CodePudding user response:
Based on your comment here is a python script that might do what you want (I'm not that familiar with python).
import re
p = re.compile(r'\b(.{4}-.{2}-.{4})')
results = p.findall('B111-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293\nB211-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293 B311-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293')
print(results)
Output: ['B111-SG-1700', 'B211-SG-1700', 'B311-SG-1700']
You can read a file as a string like this
text_file = open("file.txt", "r")
data = text_file.read()
And use findall over that. Depending on the size of the file it might require a bit more work (e.g. reading line by line for example