I have a link : https://www.cagematch.net/?id=2&nr=448&gimmick=Adam Pearce
In this link there data in divs with same class name. But I want to fetch specifi div. Like I want to fetch current gimmik then age and brand nothing else. I tried this code :
url = "http://www.cagematch.net/?id=8&nr=1&page=15"
headers = {"Accept-Encoding": "deflate"}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
links = [
"https://www.cagematch.net/" a["href"] for a in soup.select(".TCol a")
]
list = []
for u in links:
soup = BeautifulSoup(
requests.get(u, headers=headers).content, "html.parser"
)
# print(soup.h1.text)
with open("wwe/maw.csv", 'a', encoding="utf-8", newline="") as f:
wrt = writer(f)
for info in soup.select_all(".InformationBoxRow"):
# header1 = info.select_one(".InformationBoxTitle").text
# header2 = info.select_one(".InformationBoxContents").text
# table = [header1, header2, "/"]
header1 = info.select("div", class_="InformationBoxContents").text
print(header1)
With this I am getting all the data from that page. But I want to fetch only some of them. How can I get that. Is there any easy way to do it?
CodePudding user response:
Try:
import requests
import pandas as pd
from bs4 import BeautifulSoup
headers = {"Accept-Encoding": "deflate"}
def get_info(url):
soup = BeautifulSoup(
requests.get(url, headers=headers).content, "html.parser"
)
title = soup.h1.text.strip()
gimmick = soup.select_one(
'.InformationBoxTitle:-soup-contains("Current gimmick") div'
).text.strip()
age = soup.select_one(
'.InformationBoxTitle:-soup-contains("Age") div'
).text.strip()
return {"Name": title, "Gimmick": gimmick, "Age": age}
data = []
urls = ["https://www.cagematch.net/?id=2&nr=448&gimmick=Adam Pearce"]
for url in urls:
data.append(get_info(url))
df = pd.DataFrame(data)
print(df)
Prints:
Name Gimmick Age
0 Adam Pearce Adam Pearce 44 years
CodePudding user response:
result=get(url='https://www.cagematch.net/?id=2&nr=448&gimmick=Adam Pearce')
src=result.content
soup=BeautifulSoup(src,'lxml')
result = soup.find_all("div", {"class":"InformationBoxContents"})
print(result)