Home > Enterprise >  web scraping results not saving
web scraping results not saving

Time:07-05

In every other tab, index numbers and data coming to txt are reset, I want to save the data to txt one after the other without resetting.

There is no other problem in the codes other than these errors

Code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import pandas as pd
import time
import csv
import re


driver = webdriver.Chrome()

url ="http://google.com"
driver.get(url)

searchInput = driver.find_element("xpath",'/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/input')
time.sleep(1)
searchInput.send_keys("dişçi")
time.sleep(2)
searchInput.send_keys(Keys.ENTER)
time.sleep(2)
result = driver.page_source
result = driver.find_elements(By.CSS_SELECTOR, ".GyAeWb cite.iUh30")

for index,element in enumerate (result):
  print(index 1,element.text)

result = []
result = list(set(result))
 
time.sleep(2)

try:
    while 1:
        nextInput = driver.find_element(By.XPATH,'//*[@id="pnnext"]/span[2]').click()
        result = driver.find_elements(By.CSS_SELECTOR, ".GyAeWb cite.iUh30")

        for index,elements in enumerate (result):
            print(index 1,elements.text)
        
        count = 1
        with open("siteler.txt","w",encoding="UTF-8") as file:
            for item in result:
                file.write(f"{count}-{item}\n")
                count =1
      
except Exception as e:
    print(e)

finally:
    print("there is no element with '//*[@id='pnnext']/span[2]' XPATH")


driver.close()

This is how it registers:

1-<selenium.webdriver.remote.webelement.WebElement (session="488459ee0c4c000e56111d8e98626783", element="35944ab0-591a-48e6-834f-93e3a75a9192")>

CodePudding user response:

As already stated in my comment: Use 'a' instead of 'w' as an option to open the file

with open("siteler.txt", "a") as file:
   ...

See also How do I append to a file? since this question might be a duplicate.

  • Related