Home > database >  Is there a specific way of retreiving only the required information from an HTML tree? Example inclu
Is there a specific way of retreiving only the required information from an HTML tree? Example inclu

Time:10-31

I am using python3.8 and BeautfiulSoup 4 to parse a website. The section I want to read is here:

<h1 class="pr-new-br">
     <a href="/rotring-x-b104743">Rotring</a>
     <span> 0.7 Imza Uçlu Kurşun Versatil Kalem 37.28.221.368 </span>
</h1>

I find this from the website using this code and get the text from it (soup is the variable for the BeautifulSoup object from the website):

product_name_text = soup.select("h1.pr_new_br")[0].get_text()

However, this ofcourse return me all of the text. I want to seperate the text between the <a href> and the text between <span>.

How can I do this? How can I specifically for for a tag or a link in a href?

Thank you very much in advance, I am pretty new in the field, sorry if this is very basic.

CodePudding user response:

get_text method has a parameter to split different elements' text. As an example:

product_name_text = soup.select("h1.pr_new_br")[0].get_text('|')
# You will get -> Rotring|0.7 Imza Uçlu Kurşun Versatil Kalem 37.28.221.368
# Then you can split with same symbol and you would have list of different el's texts
  • Related