Home > Enterprise >  How to get the body of the table using Python?
How to get the body of the table using Python?

Time:07-05

I am self-lerning webscraping and I am trying to get tbody from a table with beautifulSoups. My attempt:

url ='https://www.agrolok.pl/notowania/notowania-cen-pszenicy.htm'
page = requests.get(url).content
soup = BeautifulSoup(page, 'lxml')

table = soup.findAll('table', class_='hover')
print(table)

Thats what I get:

<table ></table>

Any hints highly appreciated

CodePudding user response:

'table', class_='hover' that contains table data aka tbody, tr, td and so on are dynamic thats why you are not getting tbody but you can mimic dat selenium with pandas/bs4. I use selenium with pandas.

Script:

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.agrolok.pl/notowania/notowania-cen-pszenicy.htm')
driver.maximize_window()
time.sleep(2)

soup = BeautifulSoup(driver.page_source, 'lxml')

df = pd.read_html(str(soup))[0]
d=df.rename(columns=df.iloc[0]).drop(df.index[0])
print(d)

Output:

7/4/2022  1410  1380  343.25  4.7002  1613  1640
1    7/1/2022  1410  1300  334.50  4.7176  1578  1630
2   6/30/2022  1410  1320  350.25  4.6806  1639  1650
3   6/29/2022  1500  1380  358.50  4.6809  1678  1710
4   6/28/2022  1450  1360  356.75  4.7004  1677  1690
5   6/27/2022  1450  1360  350.00  4.6965  1644  1690
6   6/24/2022  1450  1360  357.25  4.7094  1682  1700
7   6/23/2022  1450  1360  359.00  4.7096  1691  1690
8   6/22/2022  1470  1410  370.50  4.6590  1726  1750
9   6/21/2022  1500  1370  372.50  4.6460  1731  1730
10  6/20/2022  1540  1460  388.25  4.6731  1814  1780
11  6/15/2022  1560  1460  392.75  4.6642  1832  1780
12  6/14/2022  1560  1460  392.25  4.6548  1826  1780
13  6/13/2022  1540  1460  394.50  4.6313  1827  1800
14  6/10/2022  1530  1450  391.75  4.6030  1803  1760
15   6/9/2022  1540  1500  386.25  4.5826  1770  1730
16   6/8/2022  1550  1520  381.75  4.5817  1749  1730
17   6/7/2022  1500  1540  385.50  4.5855  1768  1700
18   6/6/2022  1600  1510  397.50  4.5880  1824  1760
19   6/3/2022  1560  1490  378.25  4.5908  1736  1700
20   6/2/2022  1590  1490  382.50  4.5876  1755  1710
21   6/1/2022  1590  1490  380.50  4.5891  1746  1700
22  5/31/2022  1650  1560  392.25  4.5756  1795  1750
23  5/30/2022  1670  1590  406.75  4.5869  1866  1800
24  5/27/2022  1670  1580  414.75  4.6102  1912  1700
25  5/26/2022  1650  1580  409.50  4.6135  1889  1700
26  5/25/2022  1670  1600  404.50  4.5955  1859  1700
27  5/24/2022  1690  1630  410.50  4.6107  1893  1800
28  5/23/2022  1700  1600  426.00  4.6171  1966  1860
29  5/20/2022  1700  1630  420.75  4.6366  1951  1840
30  5/19/2022  1700  1640  422.25  4.6429  1960  1850
31  5/18/2022  1700  1640  430.50  4.6528  2003  1850
32  5/17/2022  1690  1640  438.25  4.6558  2040  1850
33  5/16/2022  1690  1640  438.25  4.6724  2048  1880
34  5/13/2022  1670  1560  416.50  4.6679  1944  1800
35  5/12/2022  1670  1540  414.25  4.6841  1940  1790
36  5/11/2022  1670  1540  403.25  4.6700  1883  1790
37  5/10/2022  1680  1560  396.50  4.6761  1854  1780
38   5/9/2022  1670  1560  394.50  4.7059  1856  1780
39   5/6/2022  1600  1580  406.25  4.6979  1909  1760
40   5/5/2022  1660  1610  401.00  4.6658  1871  1780
41   5/4/2022  1660  1630  390.50  4.6777  1827  1735
42  4/29/2022  1660  1630  400.75  4.6582  1867  1720
43  4/28/2022  1670  1640  416.50  4.6915  1954  1740
44  4/27/2022  1670  1630  418.25  4.7076  1969  1720
45  4/26/2022  1660  1640  415.25  4.6429  1928  1685
46  4/25/2022  1665  1630  408.25  4.6405  1894  1670
47  4/22/2022  1665  1650  407.00  4.6361  1887  1690
48  4/21/2022  1660  1650  405.75  4.6523  1888  1690
49  4/20/2022  1660  1660  398.50  4.6295  1845  1700
50  4/19/2022  1680  1660  399.50  4.6361  1852  1740
51  4/15/2022  1690  1660  401.00  4.6378  1860  1770
52  4/14/2022  1690  1660  401.00  4.6447  1863  1770
53  4/13/2022  1680  1630  403.00  4.6460  1872  1780
54  4/12/2022  1650  1620  399.25  4.6626  1862  1700
55  4/11/2022  1630  1590  379.50  4.6451  1763  1670
56   4/8/2022  1650  1610  372.75  4.6405  1730  1660
57   4/7/2022  1650  1610  363.75  4.6478  1691  1670
58   4/6/2022  1650  1600  364.00  4.6539  1694  1670
59   4/5/2022  1650  1620  364.50  4.6317  1688  1680
60   4/4/2022  1640  1610  363.75  4.6373  1687  1680

CodePudding user response:

soup = BeautifulSoup(HTML)

# the first argument to find tells it what tag to search for
# the second you can pass a dict of attr->value pairs to filter
# results that match the first tag
table = soup.find( "table", {"title":"TheTitle"} )

rows=list()
for row in table.findAll("tr"):
   rows.append(row)

# now rows contains each tr in the table (as a BeautifulSoup object)
# and you can search them to pull out the times

CodePudding user response:

for i in table:
   tbody = i.find_all('tbody')
  • Related