Home > database >  How to pull row from an html table by python
How to pull row from an html table by python

Time:06-07

I'm trying to pull a number that is in a td, but this td has repeated classes, and the table doesn't contain class or tr, how can I do to get this number(1,00)?

this is the html:

enter image description here

my code:

import requests
from bs4 import BeautifulSoup as BS

sample_website = ('https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-tributaria/pagamentos-e-parcelamentos/taxa-de-juros-selic#Taxa_de_Juros_Selic')

page=requests.get(sample_website)

soup = BS(page.content, "html.parser")

for row in soup.select('table')[1:]:
    taxa = soup.select('tr')[5:]
    valor_especifico = row.find_all('td')[5:]

print(valor_especifico)

This is output:

C:\Users\Francisco\PycharmProjects\INSS\Scripts\python.exe C:/Users/Francisco/PycharmProjects/INSS/MODULOS/web.py
[<td  style="text-align: center; "><strong>1999</strong></td>, <td  height="19"> <strong>janeiro</strong></td>, <td  style="text-align: center; ">391,17</td>, <td  style="text-align: center; ">349,88</td>, <td  style="text-align: center; ">326,26</td>, <td  style="text-align: center; ">302,97</td>, <td  style="text-align: center; ">277,88</td>, <td  height="19"> <strong>fevereiro</strong></td>, <td  style="text-align: center; ">387,54</td>, <td  style="text-align: center; ">347,53</td>, <td  style="text-align: center; ">324,59</td>, <td  style="text-align: center; ">300,84</td>, <td  style="text-align: center; ">275,50</td>, <td  height="19"> <strong>março</strong></td>, <td  style="text-align: center; ">384,94</td>, <td  style="text-align: center; ">345,31</td>, <td  style="text-align: center; ">322,95</td>, <td  style="text-align: center; ">298,64</td>, <td  style="text-align: center; ">272,17</td>, <td  height="19"> <strong>abril</strong></td>, <td  style="text-align: center; ">380,68</td>, <td  style="text-align: center; ">343,24</td>, <td  style="text-align: center; ">321,29</td>, <td  style="text-align: center; ">296,93</td>, <td  style="text-align: center; ">269,82</td>, <td  height="19"> <strong>maio</strong></td>, <td  style="text-align: center; ">376,43</td>, <td  style="text-align: center; ">341,23</td>, <td  style="text-align: center; ">319,71</td>, <td  style="text-align: center; ">295,30</td>, <td  style="text-align: center; ">267,80</td>, <td  height="19"> <strong>junho</strong></td>, <td  style="text-align: center; ">372,39</td>, <td  style="text-align: center; ">339,25</td>, <td  style="text-align: center; ">318,10</td>, <td  style="text-align: center; ">293,70</td>, <td  style="text-align: center; ">266,13</td>, <td  height="19"> <strong>julho</strong></td>, <td  style="text-align: center; ">368,37</td>, <td  style="text-align: center; ">337,32</td>, <td  style="text-align: center; ">316,50</td>, <td  style="text-align: center; ">292,00</td>, <td  style="text-align: center; ">264,47</td>, <td  height="19"> <strong>agosto</strong></td>, <td  style="text-align: center; ">364,53</td>, <td  style="text-align: center; ">335,35</td>, <td  style="text-align: center; ">314,91</td>, <td  style="text-align: center; ">290,52</td>, <td  style="text-align: center; ">262,90</td>, <td  height="19"> <strong>setembro</strong></td>, <td  style="text-align: center; ">361,21</td>, <td  style="text-align: center; ">333,45</td>, <td  style="text-align: center; ">313,32</td>, <td  style="text-align: center; ">288,03</td>, <td  style="text-align: center; ">261,41</td>, <td  height="19"> <strong>outubro</strong></td>, <td  style="text-align: center; ">358,12</td>, <td  style="text-align: center; ">331,59</td>, <td  style="text-align: center; ">311,65</td>, <td  style="text-align: center; ">285,09</td>, <td  style="text-align: center; ">260,03</td>, <td  height="19"> <strong>novembro</strong></td>, <td  style="text-align: center; ">355,24</td>, <td  style="text-align: center; ">329,79</td>, <td  style="text-align: center; ">308,61</td>, <td  style="text-align: center; ">282,46</td>, <td  style="text-align: center; ">258,64</td>, <td  height="19"> <strong>dezembro</strong></td>, <td  style="text-align: center; ">352,46</td>, <td  style="text-align: center; ">327,99</td>, <td  style="text-align: center; ">305,64</td>, <td  style="text-align: center; ">280,06</td>, <td  style="text-align: center; ">257,04</td>]

Process finished with exit code 0

CodePudding user response:

If I understand you correctly you want to select value 1,00 from the table Taxa de Juros Selic Acumulada Mensalmente:

import requests
from bs4 import BeautifulSoup


url = "https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-tributaria/pagamentos-e-parcelamentos/taxa-de-juros-selic#Taxa_de_Juros_Selic"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

# select correct table:
table = soup.select_one("#Selicmensalmente").find_next("table")

# select actual row (that contains "maio")
current_row = soup.select_one("tr:-soup-contains(maio)")

# get all non-empty values:
values = [s for td in current_row if (s := td.get_text(strip=True))]

# print last one:
print(values[-1])

Prints:

1,00
  • Related