Home > Back-end >  performing multiple web scrapes on the same page from a list? python
performing multiple web scrapes on the same page from a list? python

Time:03-23

Hello everyone I have my web scrape almost done I'm trying to figure out the last step. Which is to perform the steps; web scrape, save to data frame, and finally save to excel. Over a setlist.

for example, here's the code:

driver.get("website")
wait = WebDriverWait(driver, 20)
search_box = wait.until(EC.visibility_of_element_located((By.ID,"Search"))).send_keys("BC-9700021-1")

driver.switch_to.default_content()

submit_box = wait.until(EC.visibility_of_element_located((By.ID,"Submit"))).click()

order_list = []
order_info = {}

soup = BeautifulSoup(driver.page_source,'html.parser') 
def correct_tag(tag):
    return tag.name == "span" and tag.get_text(strip=True) in {
        "Order Amount",
        "Item Name",
        "Date",
        "Warehouse Number",
    }
for t in soup1.find_all(correct_tag):
    order_info[t.text] = t.find_next_sibling(text=True).strip()

order_list.append(order_info)

order_df1 = pd.DataFrame(order_list)
datatoexcel = pd.ExcelWriter('Order_sheet.xlsx')
order_df1.to_excel(datatoexcel)
datatoexcel.save()

output:

    Order Amount: 7000
    Item Name: Plastic Cup
    Date: 7/1/2022
    Warehouse Number: 000718

But at the very top where I type in the search box "BC-9700021-1" I want to be able to pull from a list saved in excel for the specific search. so the excel sheet would have a list as such:

BC-9700021-1
BC-9700024-1
BC-9700121-2
ETC.
ETC.

How could I get my program to perform the same steps as the first search but for the rest of the values without having to manually change the send key every time?

Any help would be greatly appreciated.

CodePudding user response:

Are you not familiar with for loops? Just iterate through each of those search items.

Also, you can use Selenium here, but there's a good chance you can get the data via an api. But won't know unless you share the url/site.

a_list = ['BC-9700021-1', 'BC-9700024-1', 'BC-9700121-2']


order_list = []
order_info = {}

for eachId in a_list:
    driver.get("website")
    wait = WebDriverWait(driver, 20)
    search_box = wait.until(EC.visibility_of_element_located((By.ID,"Search"))).send_keys(eachId)
    
    driver.switch_to.default_content()
    
    submit_box = wait.until(EC.visibility_of_element_located((By.ID,"Submit"))).click()
    
    
    
    soup = BeautifulSoup(driver.page_source,'html.parser') 
    def correct_tag(tag):
        return tag.name == "span" and tag.get_text(strip=True) in {
            "Order Amount",
            "Item Name",
            "Date",
            "Warehouse Number",
        }
    for t in soup1.find_all(correct_tag):
        order_info[t.text] = t.find_next_sibling(text=True).strip()
    
    order_list.append(order_info)

order_df1 = pd.DataFrame(order_list)
datatoexcel = pd.ExcelWriter('Order_sheet.xlsx')
order_df1.to_excel(datatoexcel)
datatoexcel.save()
  • Related