Home > Enterprise >  How to get a specific tr element without class or id from a html document with BeafutifulSoup?
How to get a specific tr element without class or id from a html document with BeafutifulSoup?

Time:12-06

I have got this webpage https://www.epant.gr/apofaseis-gnomodotiseis/item/1451-apofasi-730-2021.html

and I need to scrape the second last row from the large table. In other words, I need to get this (Ένδικα Μέσα -) from the table.

This is my progress so far

from bs4 import BeautifulSoup as soup
import requests
import csv


URL = 'https://www.epant.gr/apofaseis-gnomodotiseis/item/1451-apofasi-730-2021.html'
headers1 = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-61acac03-6279b8a6274777eb44d81aae", 
    "X-Client-Data": "CJW2yQEIpLbJAQjEtskBCKmdygEIuevKAQjr8ssBCOaEzAEItoXMAQjLicwBCKyOzAEI3I7MARiOnssB" }
page = requests.get(URL, headers = headers1)
soup1 = BeautifulSoup(page.content,"html.parser")
soup2 = BeautifulSoup(soup1.prettify(), "html.parser")
soup3 = soup2.find('td', text = "Ένδικα Μέσα")
print(soup3)

Thank you very much

Thank you very much, it works like a charm

CodePudding user response:

You near to a solution - Clean up you soups and try to get the parent of your result, this will give you the whole tr:

soup.find('td', text = "Ένδικα Μέσα").parent.get_text(strip=True)

or find_next('td) to access the text of its neighbour:

soup.find('td', text = "Ένδικα Μέσα").find_next('td').text

Example

from bs4 import BeautifulSoup
import requests
import csv

URL = 'https://www.epant.gr/apofaseis-gnomodotiseis/item/1451-apofasi-730-2021.html'
headers1 = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-61acac03-6279b8a6274777eb44d81aae", 
    "X-Client-Data": "CJW2yQEIpLbJAQjEtskBCKmdygEIuevKAQjr8ssBCOaEzAEItoXMAQjLicwBCKyOzAEI3I7MARiOnssB" }
page = requests.get(URL, headers = headers1)
soup = BeautifulSoup(page.content,"html.parser")
row = soup.find('td', text = "Ένδικα Μέσα").parent.get_text(strip=True)
print(row)

Output

Eνδικα Μέσα -

CodePudding user response:

You can use the selector for that field. There's a easy way to copy the selector for a element using the inspector of your browser and clicking the html tag that you want in copy > Copy Selector.

With beautiful soup you can use the soup.select(selector). The documentation describes this better.

  • Related