I am trying to scrape multiple WhatsApp messages from the same date by the following code. However, this only gives the first message of that date (4/21/2022) For instance:
Required output should be:
Hey there (message 1)
How are you? (message 2)
WBU? (message 3)
Resulting output is
Hey there (message 1)
Hey there (message 1)
Hey there (message 1)
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
day = input("Enter date: ")
month = input("Enter month: ")
year = input("Enter year: ")
date = month "/" day "/" year
driver = webdriver.Chrome()
driver.get("https://web.whatsapp.com/")
WebDriverWait(driver, 60).until(
EC.text_to_be_present_in_element(
(By.CLASS_NAME, '_1vjYt'), 'WhatsApp Web'
)
)
listContact = []
with open('cont.txt', 'r') as f:
for line in f:
line = line.replace('\n', '')
listContact.append(line)
for contact in listContact:
driver.implicitly_wait(10)
hotel = driver.find_element(By.XPATH, '//span[@title="{}"]'.format(contact))
hotel.click()
driver.implicitly_wait(10)
while (driver.find_element(
By.CSS_SELECTOR, 'div[data-pre-plain-text*="{}"]'.format(date))):
messages = driver.find_element(
By.CSS_SELECTOR, 'div[data-pre-plain-text*="{}"]'.format(date))
print(messages.text)
The HTML coding is following:
<div data-pre-plain-text="[2:39 PM, 5/1/2022] Joseph: ">
<div >
<span dir="ltr" >
<span>
Hey, there
</span>
</span>
</div>
</div>
<div data-pre-plain-text="[2:40 PM, 5/1/2022] Joseph: ">
<div >
<span dir="ltr" >
<span>
How are you?
</span>
</span>
</div>
</div>
<div data-pre-plain-text="[2:39 PM, 5/1/2022] Joseph: ">
<div >
<span dir="ltr" >
<span>
WBU?
</span>
</span>
</div>
</div>
CodePudding user response:
the last while() cycle would be better rewritten as
elements = driver.find_element(
By.CSS_SELECTOR, 'div[data-pre-plain-text*="{}"]'.format(date))
for e in elements:
print(e.text)
You are getting the same output because the body of while cycle starts new independent iteration.
CodePudding user response:
find_element
(without s
at the end) is finding always only first element on page - and it doesn't matter how many times you use it.
you have to use find_elements
(with s
at the end) to get all elements - and later use for
-loop
css = 'div[data-pre-plain-text*="{}"]'.format(date)
elements = driver.find_elements(By.CSS_SELECTOR, css)
for e in elements:
print(e.text)