Home > OS >  Scrape a hidden phone number
Scrape a hidden phone number

Time:07-01

I've been having trouble trying to extract the phone number without using selenium after clicking the "afficher le numero" button.

Here is the url to the link - https://www.mubawab.ma/fr/a/7469776/beau-terrain-à-la-vente-à-hay-izihar-superficie-68-m²-

Heres the code that I tried:

import re
import requests
from bs4 import BeautifulSoup

url = "https://www.mubawab.ma/fr/a/7469776/beau-terrain-à-la-vente-à-hay-izihar-superficie-68-m²-
"
phone_url = "https://www.mubawab.ma/jSpBT9/gAEhoRFWpm8vGww==', 'adPage"

ad_id = re.search(r"(\d )\.htm", url).group(1)

html_text = requests.get(phone_url.format(ad_id)).text

soup = BeautifulSoup(html_text, "html.parser")
phone = re.search(r"getTrackingPhone\((.*?)\)", html_text).group(1)

print(soup.select_one(".texto").get_text(strip=True), phone)

CodePudding user response:

In this case, you need to use selenium. Since it is quite difficult to understand how the payload is encoded and the time will be spent many times more. Most possible string:

YR3gCzHEBrHR63YyPD95vui5tCyoyGZZRCtdUTrrJtw=

Converted to:

ᣢ㡒䄬ീ嬤㠰℠尯〴䀶̨ۀ⪠嘡ਣ䰧〪ိ䁇䁦㗠߆ྠ㎁㠤怬Ⱡ⧓iⴠ祬ö~删ങ校屵䀠瀤槨‣㏰׏⏠᪠ӠѴ㢠ზ5ⵝ䯭涇䰧ࠢ⬠ӕ倠㓡Ġ༠ⲠǠË䜕₈Ф纾㾚ુ圪$㛀Ś⵬R儨⒗Ᏼာ挥狩⬕䐠⮀㚐䈦޳ݕҊ冑懖咏࠳⧜性ᘂ㙻ⓔዠ佊摾妤໫䕖勩ᬕᣱ⋍῅庰䶬䟦䝱௅凸潹㈠䕪౤⠥㡃忭夠㭍㞹慳ၭ"☷ᦞ䂢䠷Р睢᭍ୀ㌵׃ऄ〢㝒桾ᾠ☡犱ⶼᔨᔔᕢ㕒⣢ℰ䝐⒡ڹ䐫㋜㸩啒㄄ᾼ昂纙ઽ瘲Ⲙẻˇ帠୏湧᥂၍令偱夦䡮ऀᕚદ慢爼⠖䧜傉䤴夬݅䡯兰摍䨳0㢔仦摔䈤沥冠汈᠂ᕢń⠀㥰䣖ѵဠ幸栠

Maybe the answer is obvious to someone, but I will offer my own version with selenium. Dont forget to download webdriver for your browser, chrome like example and specify path in code

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(path)
driver.get('https://www.mubawab.ma/fr/a/7469776/beau-terrain-à-la-vente-à-hay-izihar-superficie-68-m²-')
script = BeautifulSoup(driver.page_source, 'lxml').find('div', class_='hide-phone-number-box').get('onclick')
elem = driver.find_element(By.CLASS_NAME, 'hide-phone-number-box')
driver.execute_script(script, elem)
timeout = 5
try:
    element_present = EC.presence_of_element_located((By.CLASS_NAME, 'phoneText'))
    WebDriverWait(driver, timeout).until(element_present)
    phone = BeautifulSoup(driver.page_source, 'lxml').find('p', class_='phoneText').getText()
except TimeoutException:
    print("Timed out waiting for page to load")
print(phone)

OUTPUT:

 212 6 27 47 75 46

CodePudding user response:

I have found the solution to my own problem without using selenium. You can't use requests to get the phone number because the page uses javascript to create the page with the phone number. But you can use requests_html to render the javascript and get the phone number:

from requests_html import HTMLSession

url = "https://www.mubawab.ma/fr/a/7469776/beau-terrain-à-la-vente-à-hay-izihar-superficie-68-m²- "
session = HTMLSession()
r = session.get(url)

# get the onclick code from the button
onclick = r.html.xpath('//*[@id="stickyDiv"]/div[2]/div[1]/div')[0].attrs['onclick']

# put the onclick code in a script
script = f"() => {{{onclick}}}"

# render the script
r.html.render(sleep=1, timeout=20, script=script)

# get the phone number
phone_number =  r.html.xpath('//*[@id="response"]/p')[0].text

print(phone_number)

OUTPUT :

06 27 47 75 46
  • Related