I have made a web scraper and would like to insert this into a database. Until now I wrote everything into an Excel and was working fine. But now I am not able to insert the data into a DB, it is only inserting the last record of the first loop (carname etc etc). All the data of the for loops are being completely printed in my screen but not in my DB.
This is my code, can anyone tell me how I can all data into the DB aswell.
for cars in car_names:
print(cars.text)
for cars2 in car_names2:
print(cars2.text)
for price in prices:
print(price.text)
for price2 in prices2:
print(price2.text)
for image in images:
print(image.get_attribute('src'))
print(carnamelist)
print(location)
print(len(images))
insert_query = f'''INSERT INTO cardata (CarName, CarModel, ImageUrl, FullPrice, Price, Location)
VALUES ('{cars.text}', '{cars2.text}', '{image.get_attribute('src')}', '{price.text}', '{price2.text}', '{location}');
'''
cursor.execute(insert_query)
connection.commit()
CodePudding user response:
You're iterating on all the collections to print their values, but not to insert into the database. Because of Python's (lax) scoping rules the iteration variables remain available after the iteration, therefore you can perform one insert using the last value of each scope, but that's it.
In order to insert the records you need to actually create the records and insert them e.g.
for name, name2, price, price2, image in zip(car_names, car_names2, prices, prices2, images):
cursor.execute(
"insert into cardata (carname, carmodel, imageurl, fullprice, price, location) values (?, ?, ?, ?, ?, ?)",
(name.text, name2.text, image.getattribute('src'), price.text, price2.text, location)
)
Do note the use of parameter substitution: in your original code, if the values contain any sort of SQL metacharacter (a single quote, usually) your query will break. This also opens you up to sql injection issues (which the sites you're scraping could in fact use as defensive measures).
By using placeholders, the database API can know that these are "external values" and properly handle them internally however it prefers.