I am trying to extract integers and variable values defined in JavaScript in an HTML file using Python 3 re.findall
method.
However, I am having a little difficulty matching digits enclosed in "
with \d*
, and matching an alphanumeric string enclosed in "
too.
Case 1:
s = """
<script>
var i = 1636592595;
var j = i Number("6876" "52907");
</script>
"""
pattern = r'var j = i Number(\"(\d*)\" \"(\d*)\");'
m = re.findall(pattern, s)
print(m) # Output: []
The desired output should contain 6876
and 52907
, but an empty list []
was obtained.
Case 2:
s = """
xhr.send(JSON.stringify({
"bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
"pow": j
}));
"""
pattern = r'"bm-foo": \"(\w*)\",'
m = re.findall(pattern, s)
print(m) # Output: []
The desired output should contain AAQAAAAE/////4ytkgqq/oWI
, but an empty list []
was obtained.
Can I have some help explaining why my regex patterns are not matching it?
CodePudding user response:
In the first regexp you need to escape
, (
, and )
.
In the second regexp, use [^"]*
instead of \w*
, since \w
doesn't match punctuation like /
.
import re
s = """
<script>
var i = 1636592595;
var j = i Number("6876" "52907");
</script>
"""
pattern = r'var j = i \ Number\("(\d*)" \ \"(\d*)\"\);'
m = re.findall(pattern, s)
print(m)
s = """
xhr.send(JSON.stringify({
"bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
"pow": j
}));
"""
pattern = r'"bm-foo": "([^"]*)",'
m = re.findall(pattern, s)
print(m)