AttributeError: 'NoneType' object has no attribute 'group' (for json data with p-CodePudding

I have the same site with same json structure. It works for one but doesn't work for second link. What is the reason? What is the logic of "group(1)"

Code

import requests, re, json

r1 = requests.get('https://www.trendyol.com/apple/macbook-air-13-m1-8gb-256gb-ssd-altin-p-67940132').text
r2 = requests.get('https://www.trendyol.com/lenovo/v15-82nb003gtx-i5-10210u-8gb-512gb-mx330-ssd-15-6-fhd-windows-10-home-dizustu-bilgisayar-p-320147814').text

json_data1 = json.loads(re.search(r"window.__PRODUCT_DETAIL_APP_INITIAL_STATE__=({.*}});window", r1).group(1))
print('json_data1:',json_data1['product']['attributes'][0]['value']['name'])
print('-------')
json_data2 = json.loads(re.search(r"window.__PRODUCT_DETAIL_APP_INITIAL_STATE__=({.*}});window", r2).group(1))
print('json_data2:',json_data2['product']['attributes'][0]['value']['name'])

Output

json_data1: Apple M1
-------
Traceback (most recent call last):
  File "/Users/cucal/Desktop/scraper.py", line 33, in <module>
    json_data2 = json.loads(re.search(r"window.__PRODUCT_DETAIL_APP_INITIAL_STATE__=({.*}});window", r2).group(1))
AttributeError: 'NoneType' object has no attribute 'group'
[Finished in 453ms with exit code 1]
[cmd: ['python3', '-u', '/Users/cucal/Desktop/scraper.py']]
[dir: /Users/cucal/Desktop]
[path: /opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/Apple/usr/bin:/Applications/Postgres.app/Contents/Versions/latest/bin]

CodePudding user response：

Did you see the diffence,

In [1]: re.search(r"window.__PRODUCT_DETAIL_APP_INITIAL_STATE__=({.*}});window", r1)
Out[1]: <re.Match object; span=(136194, 164410), match='window.__PRODUCT_DETAIL_APP_INITIAL_STATE__={"pro>

In [2]: re.search(r"window.__PRODUCT_DETAIL_APP_INITIAL_STATE__=({.*}});window", r2)

The first one returns None That means there is no matching for the regex you provided there. That's why you got error for second one.

What is group(1) in your regex?

Here is the representation.

For the second sample you have to do like this,

matches = re.search(r"window.__PRODUCT_DETAIL_APP_INITIAL_STATE__\s=\s({.*}})", r2)
if matches:
    json_data2 = json.loads(matches.group(1))
    print('json_data2:',json_data2['product']['attributes'][0]['value']['name'])

Which output:

json_data2: Intel Core i5

And if you observe the second webpage source, it has spaces before the = so you need to adjust the regex accordingly.

New regex for the second sample, 'regex_demo'

r"window.__PRODUCT_DETAIL_APP_INITIAL_STATE__\s=\s({.*}})"