I am trying to extract all the items from a list on this
Code
import bs4, requests
import pandas as pd
wagon_stock_url = 'https://parramattamg.com.au/up4053-961230-mg-hs-2020.html'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/96.0.4664.45 Safari/537.36'
}
response = requests.get(wagon_stock_url, headers = headers)
soup = bs4.BeautifulSoup(response.text, 'html.parser')
name = soup.select(".stockItemInfo").
I know soup.select(".stockItemInfo")
just select the class items as a list, but how to get the each item over the iteration?
CodePudding user response:
Your close to a solution - Just add an li
to your css selector
, what will give you a result set of all the list elements:
name = soup.select(".stockItemInfo li")
--> [<li> <span><strong>Vehicle</strong></span>: 2020 MG HS </li>, <li> <span><strong>Series</strong></span>: SAS23 MY20 </li>, <li> <span><strong>Badge</strong></span>: Vibe DCT FWD </li>, <li> <span><strong>Colour</strong></span>: White </li>, <li> <span><strong>Odometer</strong></span>: 11,213kms </li>, <li> <span><strong>Body</strong></span>: Wagon </li>, <li> <span><strong>Engine</strong></span>: 1.5 litre, 4-cylinder </li>, <li> <span><strong>Fuel Type</strong></span>: Petrol </li>, <li> <span><strong>Transmission</strong></span>: 7-speed Automatic </li>, <li> <span><strong>Doors</strong></span>: 5-door </li>, <li> <span><strong>Seats</strong></span>: 5 </li>, <li> <span><strong>Trim</strong></span>: Black </li>, <li> <span><strong>VIN</strong></span>: LSJA24U92LN012249 </li>, <li> <span><strong>Registration</strong></span>: EIT61T </li>, <li> <span><strong>Stock Number</strong></span>: UP4053 </li>, <li> <span><strong>MY</strong></span>: 20 </li>]
or get just the names as list:
names = [x.text for x in soup.select(".stockItemInfo li strong")]
--> ['Vehicle', 'Series', 'Badge', 'Colour', 'Odometer', 'Body', 'Engine', 'Fuel Type', 'Transmission', 'Doors', 'Seats', 'Trim', 'VIN', 'Registration', 'Stock Number', 'MY']
To get a list of dicts with names and values
In case you like to post process, push to pd.DataFrame(data)
, ...
data = []
for x in soup.select(".stockItemInfo li"):
item = x.text.strip().split(':')
data.append({
'name': item[0],
'value': item[1]
})
data
Output
[{'name': 'Vehicle', 'value': ' 2020 MG HS'},
{'name': 'Series', 'value': ' SAS23 MY20'},
{'name': 'Badge', 'value': ' Vibe DCT FWD'},
{'name': 'Colour', 'value': ' White'},
{'name': 'Odometer', 'value': ' 11,213kms'},
{'name': 'Body', 'value': ' Wagon'},
{'name': 'Engine', 'value': ' 1.5 litre, 4-cylinder'},
{'name': 'Fuel Type', 'value': ' Petrol'},
{'name': 'Transmission', 'value': ' 7-speed Automatic'},
{'name': 'Doors', 'value': ' 5-door'},
{'name': 'Seats', 'value': ' 5'},
{'name': 'Trim', 'value': ' Black'},
{'name': 'VIN', 'value': ' LSJA24U92LN012249'},
{'name': 'Registration', 'value': ' EIT61T'},
{'name': 'Stock Number', 'value': ' UP4053'},
{'name': 'MY', 'value': ' 20'}]
CodePudding user response:
The minimal working solution, so far:
Code
import bs4, requests
import pandas as pd
wagon_stock_url = 'https://parramattamg.com.au/up4053-961230-mg-hs-2020.html'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
}
response = requests.get(wagon_stock_url, headers = headers)
soup = bs4.BeautifulSoup(response.text, 'html.parser')
data=[]
names = soup.select(".stockItemInfo > ul >li")
for name in names:
name= name.get_text(strip=True).split(':')
Name= name[0]
Value= name[1]
data.append([Name,Value])
cols=["Name","Value"]
df = pd.DataFrame(data,columns=cols)
print(df)
#df.to_csv('info.csv',index=False) #to store data in your system
Output:
Name Value
0 Vehicle 2020 MG HS
1 Series SAS23 MY20
2 Badge Vibe DCT FWD
3 Colour White
4 Odometer 11,213kms
5 Body Wagon
6 Engine 1.5 litre, 4-cylinder
7 Fuel Type Petrol
8 Transmission 7-speed Automatic
9 Doors 5-door
10 Seats 5
11 Trim Black
12 VIN LSJA24U92LN012249
13 Registration EIT61T
14 Stock Number UP4053
15 MY 20