I am using python3.8 and BeautfiulSoup 4 to parse a website. The section I want to read is here:
<h1 class="pr-new-br">
<a href="/rotring-x-b104743">Rotring</a>
<span> 0.7 Imza Uçlu Kurşun Versatil Kalem 37.28.221.368 </span>
</h1>
I find this from the website using this code and get the text from it (soup is the variable for the BeautifulSoup object from the website):
product_name_text = soup.select("h1.pr_new_br")[0].get_text()
However, this ofcourse return me all of the text. I want to seperate the text between the <a href>
and the text between <span>.
How can I do this? How can I specifically for for a tag or a link in a href?
Thank you very much in advance, I am pretty new in the field, sorry if this is very basic.
CodePudding user response:
get_text method has a parameter to split different elements' text. As an example:
product_name_text = soup.select("h1.pr_new_br")[0].get_text('|')
# You will get -> Rotring|0.7 Imza Uçlu Kurşun Versatil Kalem 37.28.221.368
# Then you can split with same symbol and you would have list of different el's texts