Pattern for re not retriving any results-CodePudding

I'm trying to create a re pattern in python to extract this pattern of text.

contentId: '2301ae56-3b9c-4653-963b-2ad84d06ba08' contentId: 'a887526b-ff19-4409-91ff-e1679e418922'

The length of the content ID is 36 characters long and has a mix of lowercase letters and numbers with dashes included at position 8,13,18,23,36.

Any help with this would be much appreciated as I just can't seem to get the results right now.

r1 = re.findall(r'^[a-zA-Z0-9~@#$^*()_ =[\]{}|\\,.?: -]*{36}$',f.read())
print(r1)

Below is the file I'm trying to pull from

Object.defineProperty(e, '__esModule', { value: !0 }), e.default = void 0;
var t = r(d[0])(r(d[1])), n = r(d[0])(r(d[2])), o = r(d[0])(r(d[3])), c = r(d[0])(r(d[4])), l = r(d[0])(r(d[5])), u = function (t) {
        return [
            {
                contentId: '2301ae56-3b9c-4653-963b-2ad84d06ba08',
                prettyId: 'super',
                style: { height: 0.5 * t }
            },
            {
                contentId: 'a887526b-ff19-4409-91ff-e1679e418922',
                prettyId: 'zap',
                style: { height: t }
            }
        ];
    },

CodePudding user response：

Is there a typo in the regex in your question? *{36} after the bracket ] that closes the character group causes an error: multiple repeat. Did you mean r'^[a-zA-Z0-9~@#$^*()_ =[\]{}|\\,.?: -]{36}$'?

Fixing that, you get no results because ^ anchors the match to the start of the line, and $ to the end of the line, so you'd only get results if this pattern was alone on a single line.

Removing these anchors, we get lots of matches because it matches any string of those characters that is 36-long:

r1 = re.findall(r'[a-zA-Z0-9~@#$^*()_ =[\]{}|\\,.?: -]{36}',t)
r1: ['var t = r(d[0])(r(d[1])), n = r(d[0]',
 ')(r(d[2])), o = r(d[0])(r(d[3])), c ',
 '= r(d[0])(r(d[4])), l = r(d[0])(r(d[',
 '2301ae56-3b9c-4653-963b-2ad84d06ba08',
 '                style: { height: 0.5',
 'a887526b-ff19-4409-91ff-e1679e418922',
 '                style: { height: t }']

To only match your ids, only look for alphanumeric characters or dashes.

r1 = re.findall(r'[a-zA-Z0-9\-]{36}',t)
r1: ['2301ae56-3b9c-4653-963b-2ad84d06ba08',
 'a887526b-ff19-4409-91ff-e1679e418922']

To make it even more specific, you could specify the positions of the dashes:

r1 = re.findall(r'[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}', t, re.IGNORECASE)

r1: ['2301ae56-3b9c-4653-963b-2ad84d06ba08',
 'a887526b-ff19-4409-91ff-e1679e418922']

Specifying the re.IGNORECASE flag removes the need to look for both upper- and lower-case characters.

Note:

You should read the file into a variable and use that variable if you're going to use its contents more than once, since f.read() won't give anything after the first .read() unless you f.seek(0)
To avoid creating a new file on disk with those contents, I just defined

t = """Object.defineProperty(e, '__esModule', { value: !0 }), e.default = void 0;
var t = r(d[0])(r(d[1])), n = r(d[0])(r(d[2])), o = r(d[0])(r(d[3])), c = r(d[0])(r(d[4])), l = r(d[0])(r(d[5])), u = function (t) {
        return [
            {
                contentId: '2301ae56-3b9c-4653-963b-2ad84d06ba08',
                prettyId: 'super',
                style: { height: 0.5 * t }
            },
            {
                contentId: 'a887526b-ff19-4409-91ff-e1679e418922',
                prettyId: 'zap',
                style: { height: t }
            }
        ];
    },"""

and used t in place of f.read() from your question.