Home > Back-end >  Web scraping with BeautifulSoup .find() always returns None
Web scraping with BeautifulSoup .find() always returns None

Time:05-30

Relevant part of the DOM: Screenshot of the DOM

This is the code I wrote:

from bs4 import BeautifulSoup
import requests

URL = 'https://www.cheapflights.com.sg/flight-search/SIN-KUL/2022-06-04?sort=bestflight_a&attempt=3&lastms=1653844067064'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
flight = soup.find('div', class_= 'resultWrapper')
print(flight)

The result that I get whenever print(flight) is executed is always None. I have tried changing to div tags with different class names but it still always returns None. The soup seems to be fine though because when I execute print(soup) it returns a text version of the DOM so the problem seems to be with the next line

Any suggestions on how I can get something other than None? Thank you!

CodePudding user response:

That's because of the User-Agent. If I try to curl the page without changing the default User-Agent, it'll return this page.

Change your code like this, to avoid that your program gets detected:

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) ..."
}
page = requests.get(URL, headers=headers)

CodePudding user response:

BeautifulSoup .find() always returns None because the data is sent via Ajax request from external url which is API calls HTML response. So to grab data; you have to use API url.

Example with full working code:

import requests
import json
from bs4 import BeautifulSoup

payload = 'searchId=LbEiRwhKF_&poll=true&pollNumber=0&applyFilters=true&filterState=&useViewStateFilterState=true&pageNumber=1&watchedResultId=&append=false&sortMode=bestflight&ascending=true&priceType=daybase&requestReason=POLL&phoenixRising=true&isSecondPhase=false&displayAdPageLocations=left,bottom-left,bottom,upper-right,right&existingAds=false&activeLeg=-1&hasFilterPreferences=false&view=list&renderPlusMinusThreeFlex=false&renderAirlineStopsMatrix=false&requestAlternateFlexDates=false&ajaxts=1653846256069&scriptsMetadata=16wB&B20Q1CQUBI3QEH&g9$g6B21CBiIiwYar1#CgI9CEB5g5UBD1L5I1D32Gg14gCF&PgE2B1iw1osDQBiz!1QF1EI1B3Eg30Cg=&stylesMetadata=22Dg1U74giE9E4Q18g1C1EB4Q16C21G3I1zQc1HhhYMIw6JQ2Q1gE7Q1gQI2IQ1g3G43HQ24C12ju1e4ICQGrI1h1wCQ%&CCK1ED7I289C50Q79BlRfBHg6BQ2Q36E4C1CRIC45B367IkQJ1gSZIkSZ10CI1k1BI10J121k87gQR1g53I1kSR30Q4J1kSZIgSZIkCQ9k1Q3JIkCQI1SQIkSZIkSQ3JIgCQ1gC11QJIkSZIkSRIgSQ85gSZI1SJIkSZIkSQ1kCR1gSJIk1Q10QZ3Q2SBIE1BI97Q10k1Q5g5Z1k1Z1E1I1kSRIECZIgQRIkSZIkCQ1ESIIEQQI1SZI460gSZIkC182kCQ==&r9version=R618c'

api_url= 'https://www.cheapflights.com.sg/s/horizon/flights/results/FlightSearchPoll?p=0'

headers= {
  
    "content-type": "application/x-www-form-urlencoded",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.62 Safari/537.36",
     "x-csrf": "$_DAcoZoQE$1Qxq2wktuQeX1wM66H6xJBg6LXd0u0KM-biFU0Ll7e68O886T7kg2pLDoSs$ycoT1x9xj50oeIEA",
    "x-r9-blue-green-version": "R618c",
    "x-requested-with": "XMLHttpRequest",
    "x-requestid": "flights#results#dfYzFA" 
    }

req=requests.post(api_url,headers=headers,data=payload)#.json()

data = json.loads(req.text)['content']

# with open('ajax.html', 'w', encoding="utf-8") as f:
#     f.write(data)

soup = BeautifulSoup(data, 'lxml')

for price in soup.select('.price-text'):
    print(price.get_text(strip=True))

Output:

S$ 135
S$ 120
S$ 120
S$ 121
S$ 126
S$ 127
S$ 127
S$ 133
S$ 133
S$ 134
S$ 135
S$ 137
S$ 146
S$ 164
S$ 120
S$ 148
S$ 157
S$ 160
S$ 165
S$ 167
S$ 171
S$ 174
S$ 177
S$ 178
S$ 180
S$ 184
S$ 189
S$ 192
S$ 286
S$ 146
S$ 154
S$ 154
S$ 157
S$ 163
S$ 167
S$ 168
S$ 168
S$ 177
S$ 184
S$ 149
S$ 157
S$ 174
S$ 176
S$ 176
S$ 187
S$ 191
S$ 191
S$ 200
S$ 211
S$ 149
S$ 154
S$ 154
S$ 157
S$ 163
S$ 167
S$ 168
S$ 168
S$ 177
S$ 184
S$ 149
S$ 152
S$ 153
S$ 162
S$ 164
S$ 165
S$ 172
S$ 174
S$ 182
S$ 183
S$ 187
S$ 191
S$ 200
S$ 200
S$ 223
S$ 151
S$ 182
S$ 165
S$ 167
S$ 169
S$ 169
S$ 171
S$ 171
S$ 174
S$ 177
S$ 178
S$ 180
S$ 184
S$ 189
S$ 190
S$ 193
S$ 160
S$ 176
S$ 176
S$ 180
S$ 187
S$ 191
S$ 191
S$ 198
S$ 201
S$ 211
S$ 171
S$ 175
S$ 188
S$ 189
S$ 190
S$ 196
S$ 199
S$ 202
S$ 207
S$ 209
S$ 209
S$ 210
S$ 213
S$ 213
S$ 246
S$ 174
S$ 188
S$ 189
S$ 190
S$ 190
S$ 197
S$ 199
S$ 199
S$ 202
S$ 209
S$ 209
S$ 210
S$ 213
S$ 213
S$ 246
S$ 175
S$ 182
S$ 198
S$ 198
S$ 203
S$ 211
S$ 213
S$ 215
S$ 220
S$ 226
S$ 239
S$ 193
S$ 247
S$ 251
S$ 255
S$ 256
S$ 256
S$ 259
S$ 259
S$ 260
S$ 266
S$ 267
S$ 269
S$ 286
S$ 323
S$ 236
S$ 132
S$ 132
S$ 133
S$ 139
S$ 143
S$ 144
S$ 146
S$ 155
S$ 158
S$ 127
  • Related