Home > Software design >  Append parsed data
Append parsed data

Time:07-07

I have a python code for data parsing from web-site. It' working fine, script is opening both pages, but in output I am receiving data only from last page

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

mylist = ['93729', '75077']

for i in mylist:
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    driver.get('https://www.comicshoplocator.com/StoreLocator')
    driver.maximize_window()
    d = driver.find_element(By.NAME, 'query')
    d.send_keys(i)
    d.send_keys(Keys.ENTER)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    for tag in soup.find_all('div', class_="LocationName"):
        print(tag.text)

My current output is:

TWENTY ELEVEN COMICS
READ COMICS
BOOMERANG COMICS
MORE FUN COMICS AND GAMES
MADNESS COMICS & GAMES
SANCTUARY BOOKS AND GAMES

Desirable output is:

HEROES COMICS
TWENTY ELEVEN COMICS
READ COMICS
BOOMERANG COMICS
MORE FUN COMICS AND GAMES
MADNESS COMICS & GAMES
SANCTUARY BOOKS AND GAMES

Keep in mind -- current list of zip-codes is for testing. I need a versatile solution for list with 10 or more items.

Thanks!

CodePudding user response:

They are generating the output of the both iterations but separately for each loop. So,I use pandas DataFrame for the final output. As dosas stated ,yes,need load time

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

# options = webdriver.ChromeOptions()
# options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options

mylist = ['93729', '75077']
data=[]
for i in mylist:   
    
    driver.get('https://www.comicshoplocator.com/StoreLocator')
    time.sleep(5)
    driver.maximize_window()
    time.sleep(3)
    d = driver.find_element(By.NAME, 'query')
    d.send_keys(i)
    d.send_keys(Keys.ENTER)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    
    for tag in soup.find_all('div', class_="LocationName"):
        title=tag.text
        data.append({
            'title':title
        })
df=pd.DataFrame(data)
print(df)

Output:

                 title
0              HEROES COMICS
1       TWENTY ELEVEN COMICS
2                READ COMICS
3           BOOMERANG COMICS
4  MORE FUN COMICS AND GAMES
5     MADNESS COMICS & GAMES
6  SANCTUARY BOOKS AND GAMES
  • Related