Home > OS >  Extract href from pages using json and scrape multipe pages
Extract href from pages using json and scrape multipe pages

Time:06-16

import requests

url = "https://baroul-timis.ro/get-av-data?param=toti-avocatii"

payload={}
headers = {
  'Accept': 'text/html, */*; q=0.01',
  'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8,pt;q=0.7',
  'Cache-Control': 'no-cache',
  'Connection': 'keep-alive',
  'Cookie': '_csrf-frontend=ccc4c9069d6ad3816ea693a980ecbebda2770e9448ffe9fed17cdf397a5e2851a:2:{i:0;s:14:"_csrf-frontend";i:1;s:32:"J3N0AJG6xybnGl91dfrlt-qMOk3hfbQ6";}',
  'Pragma': 'no-cache',
  'Referer': 'https://baroul-timis.ro/tabloul-avocatilor/',
  'Sec-Fetch-Dest': 'empty',
  'Sec-Fetch-Mode': 'cors',
  'Sec-Fetch-Site': 'same-origin',
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
  'X-Requested-With': 'XMLHttpRequest',
  'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="102", "Google Chrome";v="102"',
  'sec-ch-ua-mobile': '?0',
  'sec-ch-ua-platform': '"Windows"'
}

resp= requests.request("GET", url, headers=headers, data=payload).json()


sample=resp['data']
for test in sample:
    product=test['actions']
    print(product)

they give me these url:

<a href="/tabloul-avocatilor/avocat/av-felicia-petre" ><i  aria-hidden="true"></i></a>

But I want to these only and also I want to scrape multiple pages but the link of all the pages is same :

/tabloul-avocatilor/avocat/av-felicia-petre

CodePudding user response:

To get all 948 names and links you can use next example:

import requests
from bs4 import BeautifulSoup


url = "https://baroul-timis.ro/get-av-data?param=toti-avocatii"

data = requests.get(url).json()

for i, d in enumerate(data["data"], 1):
    first_name = d["firstname"]
    last_name = BeautifulSoup(d["lastname"], "html.parser").text
    link = BeautifulSoup(d["actions"], "html.parser").a["href"]
    print(
        "{:<3} {:<30} {:<30} {}".format(
            i, first_name[:29], last_name[:29], link
        )
    )

Prints:


...

943 Adela-Ioana                    FRUNZĂ                         /tabloul-avocatilor/avocat/av-adela-frunza
944 Marina                         GLIGOR-VOLSCHI                 /tabloul-avocatilor/avocat/av-marina-gligor-volschi
945 Denis-Alexandru                TOTH                           /tabloul-avocatilor/avocat/av-denis-toth
946 Raluca-Roxana                  ȘURIANU                        /tabloul-avocatilor/avocat/av-raluca-surianu
947 Alexandra-Bianka               CIOBANU                        /tabloul-avocatilor/avocat/av-alexandra-ciobanu
948 Alexandra-Oana                 OLARIU                         /tabloul-avocatilor/avocat/av-alexandra-olariu
  • Related