I'm just learning web scrapping & want to output the result of this website to a csv file https://www.avbuyer.com/aircraft/private-jets
but am struggling with year, sn & time field in the below code - when I put "soup" in place of "post" it works but not when I want to put them together any help would be much appreciated
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.avbuyer.com/aircraft/private-jets'
page = requests.get(url)
page
soup = BeautifulSoup(page.text, 'lxml')
soup
df = pd.DataFrame({'Plane':[''], 'Year':[''], 'S/N':[''], 'Total Time':[''], 'Price':[''], 'Location':[''], 'Description':[''], 'Tag':[''], 'Last updated':[''], 'Link':['']})
while True:
postings = soup.find_all('div', class_ = 'listing-item premium')
for post in postings:
try:
link = post.find('a', class_ = 'more-info').get('href')
link_full = 'https://www.avbuyer.com' link
plane = post.find('h2', class_ = 'item-title').text
price = post.find('div', class_ = 'price').text
location = post.find('div', class_ = 'list-item-location').text
year = post.find_all('ul', class_ = 'fa-no-bullet clearfix')[2]
year.find_all('li')[0].text
sn = post.find('ul', class_ = 'fa-no-bullet clearfix')[2]
sn.find('li')[1].text
time = post.find('ul', class_ = 'fa-no-bullet clearfix')[2]
time.find('li')[2].text
desc = post.find('div', classs_ = 'list-item-para').text
tag = post.find('div', class_ = 'list-viewing-date').text
updated = post.find('div', class_ = 'list-update').text
df = df.append({'Plane':plane, 'Year':year, 'S/N':sn, 'Total Time':time, 'Price':price, 'Location':location,
'Description':desc, 'Tag':tag, 'Last updated':updated, 'Link':link_full}, ignore_index = True)
except:
pass
next_page = soup.find('a', {'rel':'next'}).get('href')
next_page_full = 'https://www.avbuyer.com' next_page
next_page_full
url = next_page_full
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
df.to_csv('/Users/xxx/avbuyer.csv')
CodePudding user response:
Try this:
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers= {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.avbuyer.com/aircraft/private-jets')
soup = BeautifulSoup(response.content, 'html.parser')
postings = soup.find_all('div', class_ = 'listing-item premium')
temp=[]
for post in postings:
link = post.find('a', class_ = 'more-info').get('href')
link_full = 'https://www.avbuyer.com' link
plane = post.find('h2', class_ = 'item-title').text
price = post.find('div', class_ = 'price').text
location = post.find('div', class_ = 'list-item-location').text
t=post.find_all('div',class_='list-other-dtl')
for i in t:
data=[tup.text for tup in i.find_all('li')]
years=data[0]
s=data[1]
total_time=data[2]
temp.append([plane,price,location,link_full,years,s,total_time])
df=pd.DataFrame(temp,columns=["plane","price","location","link","Years","S/N","Totaltime"])
print(df)
output:
plane price location link Years S/N Totaltime
0 Dassault Falcon 2000LXS Make offer North America Canada, United States - MD, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 2021 S/N 377 Total Time 33
1 Cirrus Vision SF50 G1 Please call North America Canada, United States - WI, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 2018 S/N 0080 Total Time 615
2 Gulfstream IV Make offer North America Canada, United States - MD, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 1990 S/N 1148 Total Time 6425
4 Boeing 787-8 Make offer Europe, Monaco, For Sale by Global Jet Monaco https://www.avbuyer.com/aircraft/private-jets/... Year 2010 S/N - Total Time 1
5 Hawker 4000 Make offer South America, Puerto Rico, For Sale by JetHQ https://www.avbuyer.com/aircraft/private-jets/... Year 2009 S/N RC-24 Total Time 2120
6 Embraer Legacy 500 Make offer North America Canada, United States - NE, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 2015 S/N 55000016 Total Time 2607
7 Dassault Falcon 2000LXS Make offer North America Canada, United States - DE, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 2015 S/N 300 Total Time 1909
8 Dassault Falcon 50EX Please call North America Canada, United States - TX, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 2002 S/N 320 Total Time 7091.9
9 Dassault Falcon 2000 Make offer North America Canada, United States - MD, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 2001 S/N 146 Total Time 6760
10 Bombardier Learjet 75 Make offer Europe, Switzerland, For Sale by Jetcraft https://www.avbuyer.com/aircraft/private-jets/... Year 2014 S/N 45-491 Total Time 1611
11 Hawker 800B Please call Europe, United Kingdom - England, For Sale by ... https://www.avbuyer.com/aircraft/private-jets/... Year 1985 S/N 258037 Total Time 9621
13 BAe Avro RJ100 Please call North America Canada, United States - MT, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 1996 S/N E3282 Total Time 45996
14 Embraer Legacy 600 Make offer North America Canada, United States - MD, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 2007 S/N 14501014 Total Time 4328
15 Bombardier Challenger 850 Make offer North America Canada, United States - AZ, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 2003 S/N 7755 Total Time 12114.1
16 Gulfstream G650 Please call Europe, Switzerland, For Sale by Jetcraft https://www.avbuyer.com/aircraft/private-jets/... Year 2013 S/N 6047 Total Time 2178
17 Bombardier Learjet 55 Price: USD $995,000 North America Canada, United States - MD, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 1982 S/N 020 Total Time 13448
18 Dassault Falcon 8X Please call North America Canada, United States - MD, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 2016 S/N 406 Total Time 1627
19 Hawker 800XP Price: USD $1,595,000 North America Canada, United States - MD, Fo... https://www.avbuyer.com/aircraft/private-jets/... Year 2002 S/N 258578 Total Time 10169
CodePudding user response:
Right now, your try
-except
clauses are not allowing you to see and debug the errors in your script. If you remove them, you will see:
IndexError: list index out of range
in line 24. There are only two elements inside the list, and you are looking for the second one. Therefore, your line should be:year = post.find_all('ul', class_ = 'fa-no-bullet clearfix')[1]
KeyError: 2
in line 26. You are usingfind()
, which returns a<class 'bs4.element.Tag'>
object, not a list. Here you want to usefind_all()
as you did in line 24. Same happens for line 28.However, instead of using this expression three times, you should rather store the result in a variable and use it later.
AttributeError: 'NoneType' object has no attribute 'text'
in line 31. There is a type, you wrote_classs
.AttributeError: 'NoneType' object has no attribute 'text'
in line 32. There is nothing wrong with your code. Instead, there are some entries in the webpage that don't have this element. You should check if thefind
method gave you any result.tag = post.find('div', class_ = 'list-viewing-date') if tag: tag = tag.text else: tag = None
You don't have a way out of your
while
loop. You should catch whenever the script cannot find a newnext_page
and add abreak
.
After changing all this, it worked for me to scrape the first page. I used:
Python 3.9.7
bs4 4.10.0
It is very important that you state what versions of Python and the libraries you are using.
Cheers!