Home > Back-end >  I'm looking for a way to extract strings from a text file using specific criterias
I'm looking for a way to extract strings from a text file using specific criterias

Time:11-25

I have a text file containing random strings. I want to use specific criterias to extract the strings that match these criterias.

Example text :

B311-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293

Example criteria :

All the strings that contains characters seperated by hyphens this way : XXX-XX-XXXX

Output : 'B311-SG-1700'

I tried creating a function but I can't seem to know how to use criterias for string specifically and how to apply them.

CodePudding user response:

You can use re module to extract the pattern from text:

import re

text = """\
B311-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293
BAKSJD873-JAN-1293 B312-SG-1700-ASJND83-ANSDN762"""

for m in re.findall(r"\b.{4}-.{2}-.{4}", text):
    print(m)

Prints:

B311-SG-1700
B312-SG-1700

CodePudding user response:

Based on your comment here is a python script that might do what you want (I'm not that familiar with python).

import re

p = re.compile(r'\b(.{4}-.{2}-.{4})')

results = p.findall('B111-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293\nB211-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293 B311-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293')

print(results)

Output: ['B111-SG-1700', 'B211-SG-1700', 'B311-SG-1700']

You can read a file as a string like this

text_file = open("file.txt", "r")
data = text_file.read()

And use findall over that. Depending on the size of the file it might require a bit more work (e.g. reading line by line for example

  • Related