Home > Back-end >  How to validate a file starts with certain perfix format?
How to validate a file starts with certain perfix format?

Time:05-07

I have a list of files with names like this.

["TYBN-220422-257172171.txt", "TYBN-120522-257172174.txt", "TYBN-320422-657172171.txt", "TYBN-220622-237172174.txt", "TYBN-FRTRE-FFF.txt",....]

I want to get only the files which has format like this TYBN-220422-257172171.txt

valid = "TYBN-{}-{}".format(numericvalue, numericvalue) I want this type of files only in the list.

CodePudding user response:

Regex explanation:

  • ^ start of the string
  • $ end of the string
  • \d matches all numbers. Equivalent to [0-9]
  • one or many of the expressions
import re

files = ["TYBN-220422-257172171.txt", "TYBN-120522-257172174.txt"]

pattern = re.compile("^TYBN-\d -\d \.txt$")

for f in files:
    if pattern.match(f):
        print(f   " matched naming convention.")

CodePudding user response:

This is probably most easily done using a regex to match the desired format i.e.

TYBN-\d -\d \.txt$

which looks for a name starting with the characters TYBN- followed by one or more digits (\d ), a -, some more digits and then finishing with .txt.

Note that when using re.match (as in the code below), matches are automatically anchored to the start of the string and thus a leading ^ (start-of-string anchor) is not required on the regex.

In python:

import re
filelist = ["TYBN-220422-257172171.txt",
            "TYBN-120522-257172174.txt",
            "TYBN-320422-657172171.txt",
            "TYBN-220622-237172174.txt",
            "TYBN-FRTRE-FFF.txt"
           ]
regex = re.compile(r'TYBN-\d -\d \.txt$')
valid = [file for file in filelist if regex.match(file)]

Output:

[
 'TYBN-220422-257172171.txt',
 'TYBN-120522-257172174.txt',
 'TYBN-320422-657172171.txt',
 'TYBN-220622-237172174.txt'
]

CodePudding user response:

Try this one.

lst = ["TYBN-220422-257172171.txt",  "TYBN-120522-257172174.txt", "TYBN-320422-657172171.txt", "TYBN-220622-237172174.txt", "TYBN-FRTRE-FFF.txt"]

valid_format = ['TYBN',True,True] # here true for digits
valid = []

for a in lst:
    l = a.replace('.txt','').split('-')
    if l[0] == valid_format[0]:
        if [i.isdigit() for i in l[1:]] == valid_format[1:]:
                valid.append(a)

print(valid)

OUTPUT:

['TYBN-220422-257172171.txt',
 'TYBN-120522-257172174.txt',
 'TYBN-320422-657172171.txt',
 'TYBN-220622-237172174.txt']
  • Related