Home > Enterprise >  Python re.findall Not Matching JS Variables in HTML
Python re.findall Not Matching JS Variables in HTML

Time:11-11

I am trying to extract integers and variable values defined in JavaScript in an HTML file using Python 3 re.findall method.

However, I am having a little difficulty matching digits enclosed in " with \d*, and matching an alphanumeric string enclosed in " too.

Case 1:

s = """
   <script>
    var i = 1636592595;
        var j = i   Number("6876"   "52907");
   </script>
"""
pattern = r'var j = i   Number(\"(\d*)\"   \"(\d*)\");'
m = re.findall(pattern, s)
print(m) # Output: []

The desired output should contain 6876 and 52907, but an empty list [] was obtained.

Case 2:

s = """
       xhr.send(JSON.stringify({
              "bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
              "pow": j
          }));
"""
pattern = r'"bm-foo": \"(\w*)\",'
m = re.findall(pattern, s)
print(m) # Output: []

The desired output should contain AAQAAAAE/////4ytkgqq/oWI, but an empty list [] was obtained.

Can I have some help explaining why my regex patterns are not matching it?

CodePudding user response:

In the first regexp you need to escape , (, and ).

In the second regexp, use [^"]* instead of \w*, since \w doesn't match punctuation like /.

import re

s = """
   <script>
    var i = 1636592595;
        var j = i   Number("6876"   "52907");
   </script>
"""
pattern = r'var j = i \  Number\("(\d*)" \  \"(\d*)\"\);'
m = re.findall(pattern, s)
print(m)

s = """
       xhr.send(JSON.stringify({
              "bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
              "pow": j
          }));
"""
pattern = r'"bm-foo": "([^"]*)",'
m = re.findall(pattern, s)
print(m)

DEMO

  • Related