Home > front end >  Pattern for re not retriving any results
Pattern for re not retriving any results

Time:11-03

I'm trying to create a re pattern in python to extract this pattern of text.

contentId: '2301ae56-3b9c-4653-963b-2ad84d06ba08' contentId: 'a887526b-ff19-4409-91ff-e1679e418922'

The length of the content ID is 36 characters long and has a mix of lowercase letters and numbers with dashes included at position 8,13,18,23,36.

Any help with this would be much appreciated as I just can't seem to get the results right now.

r1 = re.findall(r'^[a-zA-Z0-9~@#$^*()_ =[\]{}|\\,.?: -]*{36}$',f.read())
print(r1)

Below is the file I'm trying to pull from

Object.defineProperty(e, '__esModule', { value: !0 }), e.default = void 0;
var t = r(d[0])(r(d[1])), n = r(d[0])(r(d[2])), o = r(d[0])(r(d[3])), c = r(d[0])(r(d[4])), l = r(d[0])(r(d[5])), u = function (t) {
        return [
            {
                contentId: '2301ae56-3b9c-4653-963b-2ad84d06ba08',
                prettyId: 'super',
                style: { height: 0.5 * t }
            },
            {
                contentId: 'a887526b-ff19-4409-91ff-e1679e418922',
                prettyId: 'zap',
                style: { height: t }
            }
        ];
    },

CodePudding user response:

Is there a typo in the regex in your question? *{36} after the bracket ] that closes the character group causes an error: multiple repeat. Did you mean r'^[a-zA-Z0-9~@#$^*()_ =[\]{}|\\,.?: -]{36}$'?

Fixing that, you get no results because ^ anchors the match to the start of the line, and $ to the end of the line, so you'd only get results if this pattern was alone on a single line.

Removing these anchors, we get lots of matches because it matches any string of those characters that is 36-long:

r1 = re.findall(r'[a-zA-Z0-9~@#$^*()_ =[\]{}|\\,.?: -]{36}',t)
r1: ['var t = r(d[0])(r(d[1])), n = r(d[0]',
 ')(r(d[2])), o = r(d[0])(r(d[3])), c ',
 '= r(d[0])(r(d[4])), l = r(d[0])(r(d[',
 '2301ae56-3b9c-4653-963b-2ad84d06ba08',
 '                style: { height: 0.5',
 'a887526b-ff19-4409-91ff-e1679e418922',
 '                style: { height: t }']

To only match your ids, only look for alphanumeric characters or dashes.

r1 = re.findall(r'[a-zA-Z0-9\-]{36}',t)
r1: ['2301ae56-3b9c-4653-963b-2ad84d06ba08',
 'a887526b-ff19-4409-91ff-e1679e418922']

To make it even more specific, you could specify the positions of the dashes:

r1 = re.findall(r'[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}', t, re.IGNORECASE)

r1: ['2301ae56-3b9c-4653-963b-2ad84d06ba08',
 'a887526b-ff19-4409-91ff-e1679e418922']

Specifying the re.IGNORECASE flag removes the need to look for both upper- and lower-case characters.


Note:

  1. You should read the file into a variable and use that variable if you're going to use its contents more than once, since f.read() won't give anything after the first .read() unless you f.seek(0)

  2. To avoid creating a new file on disk with those contents, I just defined

t = """Object.defineProperty(e, '__esModule', { value: !0 }), e.default = void 0;
var t = r(d[0])(r(d[1])), n = r(d[0])(r(d[2])), o = r(d[0])(r(d[3])), c = r(d[0])(r(d[4])), l = r(d[0])(r(d[5])), u = function (t) {
        return [
            {
                contentId: '2301ae56-3b9c-4653-963b-2ad84d06ba08',
                prettyId: 'super',
                style: { height: 0.5 * t }
            },
            {
                contentId: 'a887526b-ff19-4409-91ff-e1679e418922',
                prettyId: 'zap',
                style: { height: t }
            }
        ];
    },"""

and used t in place of f.read() from your question.

  • Related