I'm trying to create a script using python that separate 2 kind of websites , the one with SPF included and the others with SPF , and classify them using python, so in the beginning it worked perfectly but these daysit gives me a message error that I don't find a clue about it
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from concurrent.futures import ThreadPoolExecutor
import re
import requests
import json
from datetime import datetime
from colorama import Fore, Back, Style
import colorama
from webdriver_manager.chrome import ChromeDriverManager
colorama.init()
def checkCaptchaPresent(driver):
captchaFound = True
while captchaFound:
try:
driver.find_element_by_id("captcha-form")
driver.set_window_position(0, 0)
except:
driver.set_window_position(20000, 0)
captchaFound = False
return 0
def requestSPF(url):
response = requests.get("https://api.sparkpost.com/api/v1/messaging-tools/spf/query?domain={}".format(url)).json()
for error in response['results']['errors']:
if "does not have an SPF record" in error['message']:
print(Fore.RED "{} does not have an SPF record".format(url))
return [url]
print(Fore.GREEN "{} have an SPF record".format(url))
return []
chrome_options = Options()
PATH = "webdriver/chromedriver.exe"
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1000,1000')
chrome_options.add_argument('log-level=3')
# chrome_options.add_argument("--user-data-dir=SFPInspector")
while True:
driver = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
driver.set_window_position(20000, 0)
search_term = input("Enter search term: ")
number_results = int(input("Max number of url to scrape: "))
language_code = "en"
driver.get('https://www.google.com/search?q={}&num={}&hl={}'.format(search_term, number_results 1, language_code))
print('https://www.google.com/search?q={}&num={}&hl={}'.format(search_term, number_results 1, language_code))
checkCaptchaPresent(driver)
urls = driver.find_elements_by_xpath("//div[@id='search']/div/div/div[@class='g']//div[@class='yuRUbf']/a")
websiteLink = []
for url in urls:
scrappedURL = url.get_attribute('href')
print(scrappedURL)
websiteLink.append(scrappedURL)
filteredURL = []
for i, url in enumerate(websiteLink):
match = re.compile("^http.*com[/]")
matchedURL = match.findall(url)
filteredURL = matchedURL
filteredURL = [url.replace('https:', '').replace('http:', '').replace('/', '') for url in filteredURL]
noSPFURL = []
with ThreadPoolExecutor(max_workers=int(10)) as pool:
for res in pool.map(requestSPF, filteredURL):
noSPFURL = res
print(Style.RESET_ALL)
driver.close()
fileName = datetime.now().strftime("%d%m%Y-%H%M")
print("Saving reports: report/{}_AllSite_{}.txt".format(''.join(e for e in search_term if e.isalnum()), fileName))
with open('report/{}_AllSite_{}.txt'.format(''.join(e for e in search_term if e.isalnum()), fileName), 'w') as filehandle:
for link in websiteLink:
filehandle.write("{}\n".format(link))
print("Saving reports: report/{}_NoSPF_{}.txt".format(''.join(e for e in search_term if e.isalnum()), fileName))
with open('report/{}_NoSPF_{}.txt'.format(''.join(e for e in search_term if e.isalnum()), fileName), 'w') as filehandle:
for link in noSPFURL:
filehandle.write("{}\n".format(link))
The output message is as follows:
====== WebDriver manager ======
Could not get version for google-chrome with the command: powershell "$ErrorActionPreference='silentlycontinue' ; (Get-Item -Path "$env:PROGRAMFILES\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion ; if (-not $? -or $? -match $error) { (Get-Item -Path "$env:PROGRAMFILES(x86)\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion } if (-not $? -or $? -match $error) { (Get-Item -Path "$env:LOCALAPPDATA\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion } if (-not $? -or $? -match $error) { reg query "HKCU\SOFTWARE\Google\Chrome\BLBeacon" /v version } if (-not $? -or $? -match $error) { reg query "HKLM\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\Google Chrome" /v version }"
Current google-chrome version is UNKNOWN
Get LATEST chromedriver version for UNKNOWN google-chrome
Trying to download new driver from https://chromedriver.storage.googleapis.com/98.0.4758.102/chromedriver_win32.zip
Driver has been saved in cache [C:\Users\dell\.wdm\drivers\chromedriver\win32\98.0.4758.102]
Enter search term:
CodePudding user response:
This error message...
====== WebDriver manager ======
Could not get version for google-chrome with the command: powershell "$ErrorActionPreference='silentlycontinue' ; (Get-Item -Path "$env:PROGRAMFILES\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion ; if (-not $? -or $? -match $error) { (Get-Item -Path "$env:PROGRAMFILES(x86)\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion } if (-not $? -or $? -match $error) { (Get-Item -Path "$env:LOCALAPPDATA\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion } if (-not $? -or $? -match $error) { reg query "HKCU\SOFTWARE\Google\Chrome\BLBeacon" /v version } if (-not $? -or $? -match $error) { reg query "HKLM\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\Google Chrome" /v version }"
Current google-chrome version is UNKNOWN
Get LATEST chromedriver version for UNKNOWN google-chrome
...implies that the Webdriver Manager was unable to retrieve the version of the installed google-chrome browser within the system through any of the below powershell commands and registry query:
- Get-Item -Path "$env:PROGRAMFILES\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion
Get-Item -Path "$env:PROGRAMFILES(x86)\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion
Get-Item -Path "$env:LOCALAPPDATA\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion
reg query "HKCU\SOFTWARE\Google\Chrome\BLBeacon" /v version
reg query "HKLM\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\Google Chrome" /v version
As a result, Webdriver Manager downloaded the latest version of ChromeDriver i.e. v98.0.4758.102
Solution
Uninstall the older version of Google Chrome and reinstall a fresh and updated version of Google Chrome at the default location.
CodePudding user response:
If your chrome is installed under C:\Program Files (x86)
then read the following:
The issue for me was in the powershell
invoked command :
(Get-Item -Path "$env:PROGRAMFILES(x86)\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion
try to invoke it yourself and you'll find :
Get-Item : Cannot find path 'C:\Program Files(x86)\Google\Chrome\Application\chrome.exe' because it does not exist.
At line:1 char:2
(Get-Item -Path "$env:PROGRAMFILES(x86)\Google\Chrome\Application\chr ...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CategoryInfo : ObjectNotFound: (C:\Program File...tion\chrome.exe:String) [Get-Item], ItemNotFoundExcep
tion
FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetItemCommand
The problem reside in an omitted space between Program Files
and (x86)
i.e it should be
C:\Program Files (x86)
and not
C:\Program Files(x86)
(no such folder under windows OS)
The tricky part is that in the source code file, the space actually exists between the environment variable PROGRAMFILES
and (x86)
:
r'(Get-Item -Path "$env:PROGRAMFILES (x86)\Google\Chrome\Application\chrome.exe").VersionInfo.FileVersion',
I think raw-stringifying (r'
) that command wiped off ,somehow, the space.
side note : It's noteworthy to mention that a possible fix is en route to be merged, which the pull requester mentioned in a comment there:
As one can see, the reading the version from the registry is about 22.000 times faster that spawning powershell process to do the same thing. More importantly, the read chrome version will be correct (and not UNKNOWN)
A simple solution is to uninstall google chrome and reinstall it under one of the other stacktraced paths (e.g. C:\Program Files
)