I am trying to webscrape "https://www.futbol24.com/" and I am recognised as bot. I tried everything, including the removal of signatures in the javascript of chromedriver.exe, or changing user-agent and proxy, or playing with the several chrome_options.
However, I do reach the website if I simply use Chrome while it always fail whenever I use chromedriver instead. I think there may be something in the headers suggesting to the website when I try to access it by script or not. However, it seems it is impossible (or quite diffucult) to change the headers.
I am not expert about networking, so there may be some solution I could not find yet. Can somebody help me with that?
CodePudding user response:
you can use undetected-chromedriver
CodePudding user response:
Try this. It seems no problem
ChromeDriver driver;
ChromeOptions options = new ChromeOptions();
options.AcceptInsecureCertificates = true;
var service = ChromeDriverService.CreateDefaultService();
service.HideCommandPromptWindow = true;
options.BinaryLocation = @"Chrome\Chrome\Chrome.exe";
options.AddArguments("user-data-dir=/path/to/your/custom/profilotest");
options.AddArguments(new List<string>() { "no-sandbox", "disable-gpu" });
//options.AddArgument("headless");
options.AddUserProfilePreference("browser.download.useDownloadDir", true);
options.AddArguments("--browser.helperApps.neverAsk.saveToDisk=pdf");
options.AddUserProfilePreference("plugins.always_open_pdf_externally", true);
driver = new ChromeDriver(service, options);
driver.Navigate().GoToUrl("https://www.futbol24.com");