I have looked everywhere to find a solution(Including old stackoverflow posts of related issues) to remove javascript not available as output, it gives this for dynamic sites so I decided to use selenium instead of requests library and I still get the same issue. Anybody know how to fix this issue so its possible to scrape dynamic sites. I simply want to retrieve the text from dynamic sites. I've exhausted all ways to find a solution below is my code feel free to add or recommend a solution.
Console output: JavaScript is not available. We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using twitter.com. You can see a list of supported browsers in our Help Center. Help Center
Below is my code:
import time
from selenium import webdriver
from bs4 import BeautifulSoup
browser = webdriver.Chrome('chromedriver')
options = webdriver.ChromeOptions()
options.headless = True
options.add_argument('--enable-javascript')
options.add_argument("--headless")
browser.get("https:/www.twitter.com/")
time.sleep(2)
html = browser.page_source
soup = BeautifulSoup(html, 'html.parser')
L = soup.getText()
time.sleep(2)
print(L)
CodePudding user response:
Your URL is incorrect, it should be https://twitter.com/
Twitter uses bot detection technology, and when you use selenium it searches for some data about the browser.
Basically, all you need is to change the cdc_ string in the driver.
There is a link to the same question: link
CodePudding user response:
Javascript is enabled in all browsers by default unless you have explicitly disabled it. In this usecase it seems Selenium driven ChromeDriver initiated google-chrome Browsing Context is getting detected as a bot
However, I was able to retrieve the Page Source using a few tweaks as follows:
Code Block:
options = Options() options.headless = True options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) options.add_argument('--disable-blink-features=AutomationControlled') options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36") s = Service('C:\\BrowserDrivers\\chromedriver.exe') driver = webdriver.Chrome(service=s, options=options) driver.get("https:/www.twitter.com/") print(driver.page_source)
Console Output:
<html dir="ltr" lang="en-GB" style="overflow-y: scroll; overscroll-behavior-y: none; font-size: 15px;"><head><style>input::placeholder { user-select: none; -webkit-user-select: none; }</style><style>@font-face { font-family: TwitterChirpExtendedHeavy; src: url(https://abs.twimg.com/fonts/v1/chirp-extended-heavy-web.woff2) format('woff2'); src: url(https://abs.twimg.com/fonts/v1/chirp-extended-heavy-web.woff) format('woff'); font-weight: 800; font-style: 'normal'; font-display: 'swap'; } @font-face { font-family: TwitterChirp; src: url(https://abs.twimg.com/fonts/v2/chirp-regular-web.woff2) format('woff2'); src: url(https://abs.twimg.com/fonts/v2/chirp-regular-web.woff) format('woff'); font-weight: 400; font-style: 'normal'; font-display: 'swap'; } @font-face { font-family: TwitterChirp; src: url(https://abs.twimg.com/fonts/v2/chirp-medium-web.woff2) format('woff2'); src: url(https://abs.twimg.com/fonts/v2/chirp-medium-web.woff) format('woff'); font-weight: 500; font-style: 'normal'; font-display: 'swap'; } @font-face { font-family: TwitterChirp; src: url(https://abs.twimg.com/fonts/v2/chirp-bold-web.woff2) format('woff2'); src: url(https://abs.twimg.com/fonts/v2/chirp-bold-web.woff) format('woff'); font-weight: 700; font-style: 'normal'; font-display: 'swap'; } @font-face { font-family: TwitterChirp; src: url(https://abs.twimg.com/fonts/v2/chirp-heavy-web.woff2) format('woff2'); src: url(https://abs.twimg.com/fonts/v2/chirp-heavy-web.woff) format('woff'); font-weight: 800; font-style: 'normal'; font-display: 'swap'; }</style><meta charset="utf-8"> <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0,viewport-fit=cover"><link rel="preconnect" href="//abs.twimg.com"><link rel="dns-prefetch" href="//abs.twimg.com"><link rel="preconnect" href="//api.twitter.com"><link rel="dns-prefetch" href="//api.twitter.com"><link rel="preconnect" href="//pbs.twimg.com"><link rel="dns-prefetch" href="//pbs.twimg.com"><link rel="preconnect" href="//t.co"><link rel="dns-prefetch" href="//t.co"><link rel="preconnect" href="//video.twimg.com"><link rel="dns-prefetch" href="//video.twimg.com"><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/polyfills.86126f05.js" nonce=""><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/vendors~main.943109f5.js" nonce=""><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/i18n/en-GB.e698d8f5.js" nonce=""><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/main.1ccd30a5.js" nonce=""><meta property="fb:app_id" content="2231777543"> . . <noscript> <style> body { -ms-overflow-style: scrollbar; overflow-y: scroll; overscroll-behavior-y: none; } .errorContainer { background-color: #FFF; color: #0F1419; max-width: 600px; margin: 0 auto; padding: 10%; font-family: Helvetica, sans-serif; font-size: 16px; } .errorButton { margin: 3em 0; } .errorButton a { background: #1DA1F2; border-radius: 2.5em; color: white; padding: 1em 2em; text-decoration: none; } .errorButton a:hover, .errorButton a:focus { background: rgb(26, 145, 218); } .errorFooter { color: #657786; font-size: 80%; line-height: 1.5; padding: 1em 0; } .errorFooter a, .errorFooter a:visited { color: #657786; text-decoration: none; padding-right: 1em; } .errorFooter a:hover, .errorFooter a:active { text-decoration: underline; } #placeholder, #react-root { display: none !important; } body { background-color: #FFF !important; } </style> <div > <img width="46" height="38" srcset="https://abs.twimg.com/errors/logo46x38.png 1x, https://abs.twimg.com/errors/[email protected] 2x" src="https://abs.twimg.com/errors/logo46x38.png" alt="Twitter" /> <h1>JavaScript is not available.</h1> <p>We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using twitter.com. You can see a list of supported browsers in our Help Centre.</p> <p ><a href="https://help.twitter.com/using-twitter/twitter-supported-browsers">Help Center</a></p> <p > <a href="https://twitter.com/tos">Terms of Service</a> <a href="https://twitter.com/privacy">Privacy Policy</a> <a href="https://support.twitter.com/articles/20170514">Cookie Policy</a> <a href="https://legal.twitter.com/imprint">Imprint</a> <a href="https://business.twitter.com/en/help/troubleshooting/how-twitter-ads-work.html?ref=web-twc-ao-gbl-adsinfo&utm_source=twc&utm_medium=web&utm_campaign=ao&utm_content=adsinfo">Ads info</a> © 2022 Twitter, Inc. </p> </div> </noscript> . . <script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/polyfills.86126f05.js"></script><script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/vendors~main.943109f5.js"></script><script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/i18n/en-GB.e698d8f5.js"></script><script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/main.1ccd30a5.js"></script><script nonce="">(function () { if (!window.__SCRIPTS_LOADED__['main']) { document.getElementById('ScriptLoadFailure').style.display = 'block'; var criticalScripts = ["polyfills","vendors~main","i18n","main"]; for (var i = 0; i < criticalScripts.length; i ) { var criticalScript = criticalScripts[i]; if (!window.__SCRIPTS_LOADED__[criticalScript]) { document.getElementsByName('failedScript')[0].value = criticalScript; break; } } } })();</script><script nonce="">document.cookie = decodeURIComponent("gt=1502387523636527105; Max-Age=10800; Domain=.twitter.com; Path=/; Secure");</script><script src="https://accounts.google.com/gsi/client" id="googleGSILibrary" async="" defer=""></script><script src="https://appleid.cdn-apple.com/appleauth/static/jsapi/appleid/1/en_US/appleid.auth.js" id="signInWithAppleJsLibrary" async="" defer=""></script></body></html>