hello all I have a problem about extract tweets from twitter I write a script to go to one of the trending page on twitter and scroll down (N Times) and when scroll it extract tweet and that is work with me fine but after a number of scrolling down the page can't load new tweets and stop scrolling and no new tweets appear
when I set N=1000 for example he work fine but when he reach 600 or 400 scroll , the scroll stop and no tweets appear
I will be very happy if any one can help me
thanks a lot
my code is:
def scrap_tweets_without(url,no_scroll):
drive = webdriver.Chrome(r'C:\selinum\chromedriver.exe')
drive.get(url)
##################################################
################## GET SUCCES ##################
##################################################
texts = []
time.sleep(3)
# Start Scroll Tweets
for i in tqdm.tqdm(range(no_scroll)):
## scroll down
SCROLL_PAUSE_TIME = 0.3
# Get scroll height
drive.execute_script("window.scrollBy(0,200)", "")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
try:
# Get Group of Tweets
tweets = drive.find_elements_by_xpath('//div[@data-testid="tweetText" and @lang="ar"]')
# Insert Tweet in the List
for tx in tweets:
if tx.text not in texts:
texts.append(tx.text)
except:
pass
return texts
url ='https://twitter.com/search?q="جمال علام"&src=trend_click&pt=1535911024460718080&vertical=trends'
data = scrap_tweets_without(url,1000)
this screen of selenuim browser after 600 scroll down the page can't scroll more than that and that give me around 450 tweets i believe that there is more tweets than 400 in one hashtag or in search page if any one can help why page can load more than that
CodePudding user response:
after search in a lot of sources i found that my problem is that twitter know that i 'am a selunuim bot not user so stop loading more tweets when i scroll down so add this function and this help me
def initilaize_driver():
options = webdriver.ChromeOptions()
header = Headers().generate()['User-Agent']
options.add_argument('--headless') # runs browser in headless mode
options.add_argument('--no-sandbox')
options.add_argument("--disable-dev-shm-usage")
options.add_argument('--ignore-certificate-errors')
options.add_argument('--disable-gpu')
options.add_argument('--log-level=3')
options.add_argument('--disable-notifications')
options.add_argument('--disable-popup-blocking')
options.add_argument('--user-agent={}'.format(header))
drive =webdriver.Chrome(executable_path=ChromeDriverManager().install(),
options= options, )
drive = webdriver.Chrome(r'C:\selinum\chromedriver.exe')
return drive