I'm trying to use the hidden airbnb api. I need to reverse engineer where the ID comes from in the query string of a GET request. For example, take this listing:
https://www.airbnb.ca/rooms/47452643
The "public" ID is shown to be 47452643. However, another ID is needed to use the API.
If you look at the XHR requests in Chrome, you'll see a request starting with " StaysPdpSections?operationName". This is the request I want to replicate. If I copy the request in Insomnia or Postman, I see a variable in the query string starting with:
"variables":"{"id":"U3RheUxpc3Rpbmc6NDc0NTI2NDM="
The hidden ID "U3RheUxpc3Rpbmc6NDc0NTI2NDM" is what I need. It is needed to get the data from this request and must be inserted into the query string. How can I recover the hidden ID "U3RheUxpc3Rpbmc6NDc0NTI2NDM" for each listing dynamically?
CodePudding user response:
That target id is burried really deep in the html....
import requests
from bs4 import BeautifulSoup as bs
import json
url = 'https://www.airbnb.ca/rooms/47452643'
req = requests.get(url)
soup = bs(req.content, 'html.parser')
script = soup.select_one('script[type="application/json"][id="data-state"]')
data = json.loads(script.text)
target = data.get('niobeMinimalClientData')[2][1]['variables']
print(target.get('id'))
Output:
U3RheUxpc3Rpbmc6NDc0NTI2NDM=