Home > Back-end >  How to fetch specific data from same class div using Beautifulsoup
How to fetch specific data from same class div using Beautifulsoup

Time:08-29

I have a link : https://www.cagematch.net/?id=2&nr=448&gimmick=Adam Pearce

In this link there data in divs with same class name. But I want to fetch specifi div. Like I want to fetch current gimmik then age and brand nothing else. I tried this code :

url = "http://www.cagematch.net/?id=8&nr=1&page=15"
headers = {"Accept-Encoding": "deflate"}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

links = [
    "https://www.cagematch.net/"   a["href"] for a in soup.select(".TCol a")
]
list = []
for u in links:
    soup = BeautifulSoup(
        requests.get(u, headers=headers).content, "html.parser"
    )
    # print(soup.h1.text)
    with open("wwe/maw.csv", 'a', encoding="utf-8", newline="") as f:
        wrt = writer(f)
        for info in soup.select_all(".InformationBoxRow"):
            # header1 = info.select_one(".InformationBoxTitle").text
            # header2 = info.select_one(".InformationBoxContents").text
            # table = [header1, header2, "/"]

            header1 = info.select("div", class_="InformationBoxContents").text
            print(header1)

With this I am getting all the data from that page. But I want to fetch only some of them. How can I get that. Is there any easy way to do it?

CodePudding user response:

Try:

import requests
import pandas as pd
from bs4 import BeautifulSoup

headers = {"Accept-Encoding": "deflate"}


def get_info(url):
    soup = BeautifulSoup(
        requests.get(url, headers=headers).content, "html.parser"
    )

    title = soup.h1.text.strip()
    gimmick = soup.select_one(
        '.InformationBoxTitle:-soup-contains("Current gimmick")   div'
    ).text.strip()
    age = soup.select_one(
        '.InformationBoxTitle:-soup-contains("Age")   div'
    ).text.strip()

    return {"Name": title, "Gimmick": gimmick, "Age": age}


data = []
urls = ["https://www.cagematch.net/?id=2&nr=448&gimmick=Adam Pearce"]
for url in urls:
    data.append(get_info(url))

df = pd.DataFrame(data)
print(df)

Prints:

          Name      Gimmick       Age
0  Adam Pearce  Adam Pearce  44 years

CodePudding user response:

result=get(url='https://www.cagematch.net/?id=2&nr=448&gimmick=Adam Pearce')

src=result.content

soup=BeautifulSoup(src,'lxml')

result = soup.find_all("div", {"class":"InformationBoxContents"})

print(result)
  • Related