I am self-lerning webscraping and I am trying to get tbody from a table with beautifulSoups. My attempt:
url ='https://www.agrolok.pl/notowania/notowania-cen-pszenicy.htm'
page = requests.get(url).content
soup = BeautifulSoup(page, 'lxml')
table = soup.findAll('table', class_='hover')
print(table)
Thats what I get:
<table ></table>
Any hints highly appreciated
CodePudding user response:
'table', class_='hover'
that contains table data aka tbody, tr, td and so on are dynamic thats why you are not getting tbody
but you can mimic dat selenium with pandas/bs4. I use selenium with pandas.
Script:
import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.agrolok.pl/notowania/notowania-cen-pszenicy.htm')
driver.maximize_window()
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(soup))[0]
d=df.rename(columns=df.iloc[0]).drop(df.index[0])
print(d)
Output:
7/4/2022 1410 1380 343.25 4.7002 1613 1640
1 7/1/2022 1410 1300 334.50 4.7176 1578 1630
2 6/30/2022 1410 1320 350.25 4.6806 1639 1650
3 6/29/2022 1500 1380 358.50 4.6809 1678 1710
4 6/28/2022 1450 1360 356.75 4.7004 1677 1690
5 6/27/2022 1450 1360 350.00 4.6965 1644 1690
6 6/24/2022 1450 1360 357.25 4.7094 1682 1700
7 6/23/2022 1450 1360 359.00 4.7096 1691 1690
8 6/22/2022 1470 1410 370.50 4.6590 1726 1750
9 6/21/2022 1500 1370 372.50 4.6460 1731 1730
10 6/20/2022 1540 1460 388.25 4.6731 1814 1780
11 6/15/2022 1560 1460 392.75 4.6642 1832 1780
12 6/14/2022 1560 1460 392.25 4.6548 1826 1780
13 6/13/2022 1540 1460 394.50 4.6313 1827 1800
14 6/10/2022 1530 1450 391.75 4.6030 1803 1760
15 6/9/2022 1540 1500 386.25 4.5826 1770 1730
16 6/8/2022 1550 1520 381.75 4.5817 1749 1730
17 6/7/2022 1500 1540 385.50 4.5855 1768 1700
18 6/6/2022 1600 1510 397.50 4.5880 1824 1760
19 6/3/2022 1560 1490 378.25 4.5908 1736 1700
20 6/2/2022 1590 1490 382.50 4.5876 1755 1710
21 6/1/2022 1590 1490 380.50 4.5891 1746 1700
22 5/31/2022 1650 1560 392.25 4.5756 1795 1750
23 5/30/2022 1670 1590 406.75 4.5869 1866 1800
24 5/27/2022 1670 1580 414.75 4.6102 1912 1700
25 5/26/2022 1650 1580 409.50 4.6135 1889 1700
26 5/25/2022 1670 1600 404.50 4.5955 1859 1700
27 5/24/2022 1690 1630 410.50 4.6107 1893 1800
28 5/23/2022 1700 1600 426.00 4.6171 1966 1860
29 5/20/2022 1700 1630 420.75 4.6366 1951 1840
30 5/19/2022 1700 1640 422.25 4.6429 1960 1850
31 5/18/2022 1700 1640 430.50 4.6528 2003 1850
32 5/17/2022 1690 1640 438.25 4.6558 2040 1850
33 5/16/2022 1690 1640 438.25 4.6724 2048 1880
34 5/13/2022 1670 1560 416.50 4.6679 1944 1800
35 5/12/2022 1670 1540 414.25 4.6841 1940 1790
36 5/11/2022 1670 1540 403.25 4.6700 1883 1790
37 5/10/2022 1680 1560 396.50 4.6761 1854 1780
38 5/9/2022 1670 1560 394.50 4.7059 1856 1780
39 5/6/2022 1600 1580 406.25 4.6979 1909 1760
40 5/5/2022 1660 1610 401.00 4.6658 1871 1780
41 5/4/2022 1660 1630 390.50 4.6777 1827 1735
42 4/29/2022 1660 1630 400.75 4.6582 1867 1720
43 4/28/2022 1670 1640 416.50 4.6915 1954 1740
44 4/27/2022 1670 1630 418.25 4.7076 1969 1720
45 4/26/2022 1660 1640 415.25 4.6429 1928 1685
46 4/25/2022 1665 1630 408.25 4.6405 1894 1670
47 4/22/2022 1665 1650 407.00 4.6361 1887 1690
48 4/21/2022 1660 1650 405.75 4.6523 1888 1690
49 4/20/2022 1660 1660 398.50 4.6295 1845 1700
50 4/19/2022 1680 1660 399.50 4.6361 1852 1740
51 4/15/2022 1690 1660 401.00 4.6378 1860 1770
52 4/14/2022 1690 1660 401.00 4.6447 1863 1770
53 4/13/2022 1680 1630 403.00 4.6460 1872 1780
54 4/12/2022 1650 1620 399.25 4.6626 1862 1700
55 4/11/2022 1630 1590 379.50 4.6451 1763 1670
56 4/8/2022 1650 1610 372.75 4.6405 1730 1660
57 4/7/2022 1650 1610 363.75 4.6478 1691 1670
58 4/6/2022 1650 1600 364.00 4.6539 1694 1670
59 4/5/2022 1650 1620 364.50 4.6317 1688 1680
60 4/4/2022 1640 1610 363.75 4.6373 1687 1680
CodePudding user response:
soup = BeautifulSoup(HTML)
# the first argument to find tells it what tag to search for
# the second you can pass a dict of attr->value pairs to filter
# results that match the first tag
table = soup.find( "table", {"title":"TheTitle"} )
rows=list()
for row in table.findAll("tr"):
rows.append(row)
# now rows contains each tr in the table (as a BeautifulSoup object)
# and you can search them to pull out the times
CodePudding user response:
for i in table:
tbody = i.find_all('tbody')