I have a python code for data parsing from web-site. It' working fine, script is opening both pages, but in output I am receiving data only from last page
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
mylist = ['93729', '75077']
for i in mylist:
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.comicshoplocator.com/StoreLocator')
driver.maximize_window()
d = driver.find_element(By.NAME, 'query')
d.send_keys(i)
d.send_keys(Keys.ENTER)
soup = BeautifulSoup(driver.page_source, 'lxml')
for tag in soup.find_all('div', class_="LocationName"):
print(tag.text)
My current output is:
TWENTY ELEVEN COMICS
READ COMICS
BOOMERANG COMICS
MORE FUN COMICS AND GAMES
MADNESS COMICS & GAMES
SANCTUARY BOOKS AND GAMES
Desirable output is:
HEROES COMICS
TWENTY ELEVEN COMICS
READ COMICS
BOOMERANG COMICS
MORE FUN COMICS AND GAMES
MADNESS COMICS & GAMES
SANCTUARY BOOKS AND GAMES
Keep in mind -- current list of zip-codes is for testing. I need a versatile solution for list with 10 or more items.
Thanks!
CodePudding user response:
They are generating the output of the both iterations but separately for each loop. So,I use pandas DataFrame for the final output. As dosas stated ,yes,need load time
import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
# options = webdriver.ChromeOptions()
# options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options
mylist = ['93729', '75077']
data=[]
for i in mylist:
driver.get('https://www.comicshoplocator.com/StoreLocator')
time.sleep(5)
driver.maximize_window()
time.sleep(3)
d = driver.find_element(By.NAME, 'query')
d.send_keys(i)
d.send_keys(Keys.ENTER)
soup = BeautifulSoup(driver.page_source, 'lxml')
for tag in soup.find_all('div', class_="LocationName"):
title=tag.text
data.append({
'title':title
})
df=pd.DataFrame(data)
print(df)
Output:
title
0 HEROES COMICS
1 TWENTY ELEVEN COMICS
2 READ COMICS
3 BOOMERANG COMICS
4 MORE FUN COMICS AND GAMES
5 MADNESS COMICS & GAMES
6 SANCTUARY BOOKS AND GAMES