Home > OS >  How to save scraped data to JSON in Python
How to save scraped data to JSON in Python

Time:04-20

I want to scrape website and put the desired data in to JSON file. The issue I'm countered is that I get a text and only can print it. But I need to add only specific data in JSON file and reuse data with my classes. the WEB I'm scraping and my code:

import requests
from bs4 import BeautifulSoup

URL = 'https://lt.brcauto.eu/automobiliu-paieska/'

req = requests.get(URL)
soup = BeautifulSoup(req.text, 'lxml')

pages = soup.find_all('li', class_ = 'page-item')[-2] #biggest page -2 ">" we need only before the last

cars_printed_counter = 0 

for number in range(1, int(pages.text)):
req = requests.get(URL   '?page='   str(number))
soup = BeautifulSoup(req.text, 'lxml')

if cars_printed_counter == 20:
    break

for single_car in soup.find_all('div', class_ = 'cars-wrapper'):

    if cars_printed_counter == 20:
        break

    Car_Title = single_car.find('h2', class_ = 'cars__title')
    Car_Specs = single_car.find('p', class_ = 'cars__subtitle')
    

    print('\nCar number:', cars_printed_counter   1)
    
    print(Car_Title.text)
    print(Car_Specs.text)


    cars_printed_counter  = 1

The data I get looks like this: Printed results

Car number: 19

BMW 520 Gran Turismo M-Sport

2013 | 2.0 Diesel | Automation | 255229 km | 135 kW (184 AG) | Black

Car number: 20

BMW 750 i Automation

2005 | 5.0 Gasoline | Automation | 343906 km | 270 kW (367 AG) | Grey

And the question is: How should I put the data into JSON file that it would look like this: Desired json

[
{
    "fuel": "diesel",
    "title": "BMW 520 Gran Turismo M-Sport",
    "year": 2013,
    "run": 255229,
    "type": "Black"
},
{
    "fuel": "gasoline",
    "title": "BMW 750 i Automation",
    "year": 2005,
    "run": 343906,
    "type": "Grey"
},

CodePudding user response:

You could do something like this. Check out this link on how to create dicts in python

# this is going to store your dicts of cars
list_of_printed_cars = []

for single_car in soup.find_all('div', class_ = 'cars-wrapper'):

    if cars_printed_counter == 20:
        break

    Car_Title = single_car.find('h2', class_ = 'cars__title')
    Car_Specs = single_car.find('p', class_ = 'cars__subtitle')

    # printed_car is a dictionary of the car's title and specs
    printed_car = {
        'title': Car_Title.text,
        'specs': Car_Specs.text
    }

    # this appends to a list that stores each car's title and specs
    list_of_printed_cars.append(printed_car)
    

CodePudding user response:

This tutorial is what you are looking for.

  • Related