for page in range(1, pages 1):
def append_organizator(organizator, organizatorzy=[]):
organizatorzy.append(organizator)
for i in organizatorzy:
try:
query = "INSERT INTO stypendia (organizator) values(%s)"
values = []
values.append(organizatorzy.pop())
cursor.execute(query, values)
conn.commit()
except:
pass
def append_type(rodzaj, rodzaje=[]):
rodzaje.append(rodzaj)
for i in rodzaje:
try:
query = "INSERT INTO stypendia (rodzaj) values(%s)"
values = []
values.append(rodzaje.pop())
cursor.execute(query, values)
conn.commit()
except:
pass
Those are 2 functions that are inserting the data scrapped from website into the database
The program is iterating through all available pages on site. The data that's scrapped is inserted to database.
As you can see on screenshot, the title is inserted 7 times(the amount of pages), then the organizator again 7 times etc... How can i solve this problem and have everything at same indexesdatabase ss
CodePudding user response:
You need to combine the insert
operations - each insert will create a new row. You should also just use the parameters without the array, they really aren't needed.
This example only handles two parameters (same as your code above). Add additional parameters as needed and adjust the insert statement
# The organization of this loop assumes the order of returned data is
# consistent: each "rodzaj" is at the same index as its "organizator"
# (as the original code assumes)
organizator = doc.find_all(class_='organizator-title')
rodzaj = doc.find_all('div', class_='fleft', string="Rodzaj:")
for i in range(min(len(organizator), len(rodzaj))):
o = organizator[i].text.strip().replace('\\n', '').replace('\\r', '')
r = rodzaji].find_next().text.strip().replace('\\n', '').replace('\\r', '')
append(o, r)
def append(organizator: str, rodzaj: str):
try:
query = "INSERT INTO stypendia (organizator, rodzaj) values(%s, %s)"
values = (organizator, rodzaj)
cursor.execute(query, values)
conn.commit()
except:
pass
CodePudding user response:
organizator = doc.find_all(class_='organizator-title')
for i in organizator:
x = append_organizator(i.text.strip().replace('\\n', '').replace('\\r', ''))
rodzaj = doc.find_all('div', class_='fleft', string="Rodzaj:")
for i in rodzaj:
x = append_type(i.find_next().text.strip().replace('\\n', '').replace('\\r', ''))
As you can see i am iterating through all the elements found on the web and adding them to list. How can I modify your piece of code to run this project