I want to scrape website and put the desired data in to JSON file. The issue I'm countered is that I get a text and only can print it. But I need to add only specific data in JSON file and reuse data with my classes. the WEB I'm scraping and my code:
import requests
from bs4 import BeautifulSoup
URL = 'https://lt.brcauto.eu/automobiliu-paieska/'
req = requests.get(URL)
soup = BeautifulSoup(req.text, 'lxml')
pages = soup.find_all('li', class_ = 'page-item')[-2] #biggest page -2 ">" we need only before the last
cars_printed_counter = 0
for number in range(1, int(pages.text)):
req = requests.get(URL '?page=' str(number))
soup = BeautifulSoup(req.text, 'lxml')
if cars_printed_counter == 20:
break
for single_car in soup.find_all('div', class_ = 'cars-wrapper'):
if cars_printed_counter == 20:
break
Car_Title = single_car.find('h2', class_ = 'cars__title')
Car_Specs = single_car.find('p', class_ = 'cars__subtitle')
print('\nCar number:', cars_printed_counter 1)
print(Car_Title.text)
print(Car_Specs.text)
cars_printed_counter = 1
The data I get looks like this: Printed results
Car number: 19
BMW 520 Gran Turismo M-Sport
2013 | 2.0 Diesel | Automation | 255229 km | 135 kW (184 AG) | Black
Car number: 20
BMW 750 i Automation
2005 | 5.0 Gasoline | Automation | 343906 km | 270 kW (367 AG) | Grey
And the question is: How should I put the data into JSON file that it would look like this: Desired json
[
{
"fuel": "diesel",
"title": "BMW 520 Gran Turismo M-Sport",
"year": 2013,
"run": 255229,
"type": "Black"
},
{
"fuel": "gasoline",
"title": "BMW 750 i Automation",
"year": 2005,
"run": 343906,
"type": "Grey"
},
CodePudding user response:
You could do something like this. Check out this link on how to create dicts in python
# this is going to store your dicts of cars
list_of_printed_cars = []
for single_car in soup.find_all('div', class_ = 'cars-wrapper'):
if cars_printed_counter == 20:
break
Car_Title = single_car.find('h2', class_ = 'cars__title')
Car_Specs = single_car.find('p', class_ = 'cars__subtitle')
# printed_car is a dictionary of the car's title and specs
printed_car = {
'title': Car_Title.text,
'specs': Car_Specs.text
}
# this appends to a list that stores each car's title and specs
list_of_printed_cars.append(printed_car)
CodePudding user response:
This tutorial is what you are looking for.