Home > other >  How can I get multiple messages scraped?
How can I get multiple messages scraped?

Time:05-01

I am trying to scrape multiple WhatsApp messages from the same date by the following code. However, this only gives the first message of that date (4/21/2022) For instance:

Required output should be:

Hey there (message 1)

How are you? (message 2)

WBU? (message 3)

Resulting output is

Hey there (message 1)

Hey there (message 1)

Hey there (message 1)

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

day = input("Enter date: ")
month = input("Enter month: ")
year = input("Enter year: ")
date = month   "/"   day   "/"   year

driver = webdriver.Chrome()
driver.get("https://web.whatsapp.com/")

WebDriverWait(driver, 60).until(
    EC.text_to_be_present_in_element(
        (By.CLASS_NAME, '_1vjYt'), 'WhatsApp Web'
    )
)


listContact = []
with open('cont.txt', 'r') as f:
    for line in f:
        line = line.replace('\n', '')
        listContact.append(line)

for contact in listContact:
    driver.implicitly_wait(10)
    hotel = driver.find_element(By.XPATH, '//span[@title="{}"]'.format(contact))
    hotel.click()
    driver.implicitly_wait(10)

    while (driver.find_element(
           By.CSS_SELECTOR, 'div[data-pre-plain-text*="{}"]'.format(date))):
        messages = driver.find_element(
           By.CSS_SELECTOR, 'div[data-pre-plain-text*="{}"]'.format(date))
           print(messages.text)

The HTML coding is following:


<div  data-pre-plain-text="[2:39 PM, 5/1/2022] Joseph: ">
    <div >
        <span dir="ltr" >
            <span>
                Hey, there
            </span>
        </span>
    </div>
</div>

<div  data-pre-plain-text="[2:40 PM, 5/1/2022] Joseph: ">
    <div >
        <span dir="ltr" >
            <span>
                How are you?
            </span>
        </span>
    </div>
</div>

<div  data-pre-plain-text="[2:39 PM, 5/1/2022] Joseph: ">
    <div >
        <span dir="ltr" >
            <span>
                WBU?
            </span>
        </span>
    </div>
</div>

CodePudding user response:

the last while() cycle would be better rewritten as

elements = driver.find_element(
           By.CSS_SELECTOR, 'div[data-pre-plain-text*="{}"]'.format(date))
for e in elements:
    print(e.text)

You are getting the same output because the body of while cycle starts new independent iteration.

CodePudding user response:

find_element (without s at the end) is finding always only first element on page - and it doesn't matter how many times you use it.

you have to use find_elements (with s at the end) to get all elements - and later use for-loop

css = 'div[data-pre-plain-text*="{}"]'.format(date)

elements = driver.find_elements(By.CSS_SELECTOR, css)

for e in elements:
    print(e.text)
  • Related