Home > other >  How to fix JavaScript is not available output, when web scraping dynamic sites with selenium and bea
How to fix JavaScript is not available output, when web scraping dynamic sites with selenium and bea

Time:03-12

I have looked everywhere to find a solution(Including old stackoverflow posts of related issues) to remove javascript not available as output, it gives this for dynamic sites so I decided to use selenium instead of requests library and I still get the same issue. Anybody know how to fix this issue so its possible to scrape dynamic sites. I simply want to retrieve the text from dynamic sites. I've exhausted all ways to find a solution below is my code feel free to add or recommend a solution.

Console output: JavaScript is not available. We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using twitter.com. You can see a list of supported browsers in our Help Center. Help Center

Below is my code:

import time

from selenium import webdriver

from bs4 import BeautifulSoup

browser = webdriver.Chrome('chromedriver')

options = webdriver.ChromeOptions()

options.headless = True

options.add_argument('--enable-javascript')

options.add_argument("--headless")

browser.get("https:/www.twitter.com/")

time.sleep(2)

html = browser.page_source

soup = BeautifulSoup(html, 'html.parser')

L = soup.getText()

time.sleep(2)

print(L)

CodePudding user response:

Your URL is incorrect, it should be https://twitter.com/

Twitter uses bot detection technology, and when you use selenium it searches for some data about the browser.

Basically, all you need is to change the cdc_ string in the driver.

There is a link to the same question: link

CodePudding user response:

Javascript is enabled in all browsers by default unless you have explicitly disabled it. In this usecase it seems Selenium driven ChromeDriver initiated Browsing Context is getting detected as a

However, I was able to retrieve the Page Source using a few tweaks as follows:

  • Code Block:

    options = Options()
    options.headless = True
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36")
    s = Service('C:\\BrowserDrivers\\chromedriver.exe')
    driver = webdriver.Chrome(service=s, options=options)
    driver.get("https:/www.twitter.com/")
    print(driver.page_source)
    
  • Console Output:

    <html dir="ltr" lang="en-GB" style="overflow-y: scroll; overscroll-behavior-y: none; font-size: 15px;"><head><style>input::placeholder { user-select: none; -webkit-user-select: none; }</style><style>@font-face {
      font-family: TwitterChirpExtendedHeavy;
      src: url(https://abs.twimg.com/fonts/v1/chirp-extended-heavy-web.woff2) format('woff2');
      src: url(https://abs.twimg.com/fonts/v1/chirp-extended-heavy-web.woff) format('woff');
      font-weight: 800;
      font-style: 'normal';
      font-display: 'swap';
    }
    @font-face {
      font-family: TwitterChirp;
      src: url(https://abs.twimg.com/fonts/v2/chirp-regular-web.woff2) format('woff2');
      src: url(https://abs.twimg.com/fonts/v2/chirp-regular-web.woff) format('woff');
      font-weight: 400;
      font-style: 'normal';
      font-display: 'swap';
    }
    @font-face {
      font-family: TwitterChirp;
      src: url(https://abs.twimg.com/fonts/v2/chirp-medium-web.woff2) format('woff2');
      src: url(https://abs.twimg.com/fonts/v2/chirp-medium-web.woff) format('woff');
      font-weight: 500;
      font-style: 'normal';
      font-display: 'swap';
    }
    @font-face {
      font-family: TwitterChirp;
      src: url(https://abs.twimg.com/fonts/v2/chirp-bold-web.woff2) format('woff2');
      src: url(https://abs.twimg.com/fonts/v2/chirp-bold-web.woff) format('woff');
      font-weight: 700;
      font-style: 'normal';
      font-display: 'swap';
    }
    @font-face {
      font-family: TwitterChirp;
      src: url(https://abs.twimg.com/fonts/v2/chirp-heavy-web.woff2) format('woff2');
      src: url(https://abs.twimg.com/fonts/v2/chirp-heavy-web.woff) format('woff');
      font-weight: 800;
      font-style: 'normal';
      font-display: 'swap';
    }</style><meta charset="utf-8">
    <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0,viewport-fit=cover"><link rel="preconnect" href="//abs.twimg.com"><link rel="dns-prefetch" href="//abs.twimg.com"><link rel="preconnect" href="//api.twitter.com"><link rel="dns-prefetch" href="//api.twitter.com"><link rel="preconnect" href="//pbs.twimg.com"><link rel="dns-prefetch" href="//pbs.twimg.com"><link rel="preconnect" href="//t.co"><link rel="dns-prefetch" href="//t.co"><link rel="preconnect" href="//video.twimg.com"><link rel="dns-prefetch" href="//video.twimg.com"><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/polyfills.86126f05.js" nonce=""><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/vendors~main.943109f5.js" nonce=""><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/i18n/en-GB.e698d8f5.js" nonce=""><link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web/main.1ccd30a5.js" nonce=""><meta property="fb:app_id" content="2231777543">
    .
    .
      <noscript>
        <style>
        body {
          -ms-overflow-style: scrollbar;
          overflow-y: scroll;
          overscroll-behavior-y: none;
        }
    
        .errorContainer {
          background-color: #FFF;
          color: #0F1419;
          max-width: 600px;
          margin: 0 auto;
          padding: 10%;
          font-family: Helvetica, sans-serif;
          font-size: 16px;
        }
    
        .errorButton {
          margin: 3em 0;
        }
    
        .errorButton a {
          background: #1DA1F2;
          border-radius: 2.5em;
          color: white;
          padding: 1em 2em;
          text-decoration: none;
        }
    
        .errorButton a:hover,
        .errorButton a:focus {
          background: rgb(26, 145, 218);
        }
    
        .errorFooter {
          color: #657786;
          font-size: 80%;
          line-height: 1.5;
          padding: 1em 0;
        }
    
        .errorFooter a,
        .errorFooter a:visited {
          color: #657786;
          text-decoration: none;
          padding-right: 1em;
        }
    
        .errorFooter a:hover,
        .errorFooter a:active {
          text-decoration: underline;
        }
    
          #placeholder,
          #react-root {
            display: none !important;
          }
          body {
            background-color: #FFF !important;
          }
        </style>
        <div >
          <img width="46" height="38" srcset="https://abs.twimg.com/errors/logo46x38.png 1x, https://abs.twimg.com/errors/[email protected] 2x" src="https://abs.twimg.com/errors/logo46x38.png" alt="Twitter" />
          <h1>JavaScript is not available.</h1>
          <p>We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using twitter.com. You can see a list of supported browsers in our Help Centre.</p>
          <p ><a href="https://help.twitter.com/using-twitter/twitter-supported-browsers">Help Center</a></p>
        <p >
          <a href="https://twitter.com/tos">Terms of Service</a>
          <a href="https://twitter.com/privacy">Privacy Policy</a>
          <a href="https://support.twitter.com/articles/20170514">Cookie Policy</a>
          <a href="https://legal.twitter.com/imprint">Imprint</a>
          <a href="https://business.twitter.com/en/help/troubleshooting/how-twitter-ads-work.html?ref=web-twc-ao-gbl-adsinfo&utm_source=twc&utm_medium=web&utm_campaign=ao&utm_content=adsinfo">Ads info</a>
          © 2022 Twitter, Inc.
        </p>
    
        </div>
      </noscript>
      .
      .
      <script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/polyfills.86126f05.js"></script><script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/vendors~main.943109f5.js"></script><script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/i18n/en-GB.e698d8f5.js"></script><script type="text/javascript" charset="utf-8" nonce="" crossorigin="anonymous" src="https://abs.twimg.com/responsive-web/client-web/main.1ccd30a5.js"></script><script nonce="">(function () {
      if (!window.__SCRIPTS_LOADED__['main']) {
        document.getElementById('ScriptLoadFailure').style.display = 'block';
        var criticalScripts = ["polyfills","vendors~main","i18n","main"];
        for (var i = 0; i < criticalScripts.length; i  ) {
          var criticalScript = criticalScripts[i];
          if (!window.__SCRIPTS_LOADED__[criticalScript]) {
            document.getElementsByName('failedScript')[0].value = criticalScript;
            break;
          }
        }
      }
    })();</script><script nonce="">document.cookie = decodeURIComponent("gt=1502387523636527105; Max-Age=10800; Domain=.twitter.com; Path=/; Secure");</script><script src="https://accounts.google.com/gsi/client" id="googleGSILibrary" async="" defer=""></script><script src="https://appleid.cdn-apple.com/appleauth/static/jsapi/appleid/1/en_US/appleid.auth.js" id="signInWithAppleJsLibrary" async="" defer=""></script></body></html>
    
  • Related