A python program that fetches response data from ajax website?-CodePudding

Kindly note that I am new to programming. These are the problems I encountered when learning web scraping using python. The web site I used was https://www.mobikwik.com/ (online recharge and payment site for mobile, dth, electricity bills) But all I get is a 403 response when scraping. Then I understood that this might be because the website is using ajax. My objective when making the program was to receive user input for mobile number and then pass the value in the mobile operator search in the website, the page loads up the current Operator and circle, which I want to display in my program. The python phonenumber module is useless if the mobile number is ported to another operator. Any help is appreciated. Thank You.

CodePudding user response：

There are two xhr requests and I'm not sure which of these you wanted so I did them both. All you need is to recreate the requests.

getconnectiondetails:

scrapy shell

In [1]: phone_number = '9820123456'

In [2]: url = 'https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn='

In [3]: headers = {
   ...: "Accept": "application/json, text/plain, */*",
   ...: "Accept-Encoding": "gzip, deflate, br",
   ...: "Accept-Language": "en-US,en;q=0.5",
   ...: "Cache-Control": "no-cache",
   ...: "Connection": "keep-alive",
   ...: "DNT": "1",
   ...: "Host": "rapi.mobikwik.com",
   ...: "Origin": "https://www.mobikwik.com",
   ...: "Pragma": "no-cache",
   ...: "Referer": "https://www.mobikwik.com/",
   ...: "Sec-Fetch-Dest": "empty",
   ...: "Sec-Fetch-Mode": "cors",
   ...: "Sec-Fetch-Site": "same-site",
   ...: "Sec-GPC": "1",
   ...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
   ...: 9.169 Safari/537.36",
   ...: "X-MClient": "0"
   ...: }

In [4]: req = scrapy.Request(url=url phone_number, headers=headers)

In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn=9820123456> (referer: https://www.mobikwik.com/)

In [6]: json_data = response.json()

In [7]: json_data['data']['operatorId']
Out[7]: 338

In [8]: json_data['data']['circleId']
Out[8]: 15

recommendedplans:

scrapy shell

In [1]: phone_number = '9820123456'

In [2]: url = 'https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn='

In [3]: headers = {
   ...: "Accept": "application/json, text/plain, */*",
   ...: "Accept-Encoding": "gzip, deflate, br",
   ...: "Accept-Language": "en-US,en;q=0.5",
   ...: "Cache-Control": "no-cache",
   ...: "Connection": "keep-alive",
   ...: "DNT": "1",
   ...: "Host": "rapi.mobikwik.com",
   ...: "Origin": "https://www.mobikwik.com",
   ...: "Pragma": "no-cache",
   ...: "Referer": "https://www.mobikwik.com/",
   ...: "Sec-Fetch-Dest": "empty",
   ...: "Sec-Fetch-Mode": "cors",
   ...: "Sec-Fetch-Site": "same-site",
   ...: "Sec-GPC": "1",
   ...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
   ...: 9.169 Safari/537.36",
   ...: "X-MClient": "0"
   ...: }

In [4]: req = scrapy.Request(url=url phone_number, headers=headers)

In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn=9820123456> (referer: https://www.mobikwik.com/)

In [6]: json_data = response.json()

In [7]: for item in json_data['data']['plans']:
   ...:     print(item['id'])
   ...:
1104293
1155779
1155937
1164885
1156067