Home > database >  Extracting data from web with selenium and inserting it in a pandas dataframe
Extracting data from web with selenium and inserting it in a pandas dataframe

Time:06-08

I have a problem, I cannot "take" the data I have extracted from selenium and store them somewhere to manipulate or store them

I am grabbing the data, like so:

try:
  books = WebDriverWait(driver, 10).until(
    EC.presence_of_all_elements_located((By.ID, "titleitem"))
  )
finally:
  driver.quit()

inside the try function I have extracted the data like this:

for i, book in enumerate(books):
    splited = books[i].text.split("\n")
    
    writer = str(splited[0])
    title = str(splited[1])
    publiser = str(splited[2])
    country = str(splited[3])
    ISBN = str(splited[4])

So in the end I have this code to extract exactly the data I want:

try:
  books = WebDriverWait(driver, 10).until(
    EC.presence_of_all_elements_located((By.ID, "titleitem"))
  )
  for i, book in enumerate(books):
    splited = books[i].text.split("\n")
    
    writer = str(splited[0])
    title = str(splited[1])
    publiser = str(splited[2])
    country = str(splited[3])
    ISBN = str(splited[4])
finally:
  driver.quit()

Those variables are the things I want to grab. When I print them, they appear normal (as they are on the website) But then, I try to insert them to a pandas dataframe, like this (fake_books is declared as a pd.DataFrame()):

tmp = pd.Series({'title' : title, 'author': writer, 'publiser': ekdoths})
fake_books = fake_books.append(tmp)

I have also tries a list of dictionaries:

books = [{}]

...

for i, book in enumerate(books):
    splited = books[i].text.split("\n")

    books[i]['writer'] = str(splited[0])
    books[i]['title'] = str(splited[1])
    books[i]['ekdoths'] = str(splited[2])
    books[i]['polh'] = str(splited[3])
    books[i]['ISBN'] = str(splited[4])

Neither of those things work, the programm is just "lagging" and printing an emply dataframe of list

CodePudding user response:

I always use this method, I create a list of dictionaries then I pass it into pd.DataFrame

# create empty list as the beginning of the code
df_list = []

for i, book in enumerate(books):
    splited = books[i].text.split("\n")
    
    writer = str(splited[0])
    title = str(splited[1])
    publiser = str(splited[2])
    country = str(splited[3])
    ISBN = str(splited[4])
    
    # add the scraped data into dictionary then append it into df_list
    df_list.append({"writer":writer, "title":title, "publiser":publiser, "country":country, "ISBN":ISBN})


# and the end of your code after scraping all you want
df = pd.DataFrame(df_list)
  • Related