How can I change the code to make it such that the html tags do not appear-CodePudding

from bs4 import BeautifulSoup
import requests

url = 'https://www.mediacorp.sg/en/your-mediacorp/our-artistes/tca/male-artistes/ayden-sng-12357686'

artiste_name = 'celeb-name'

page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')

txt = soup.find_all('h1', attrs={'class':artiste_name})

print(txt)

with the above code, i get the output:

[<*h1 class="celeb-name">Ayden Sng</h1*>] #asterisks added to show h1 tags

What do i need to change in my code or how can i make it such that i only get 'Ayden Sng' as my output?

CodePudding user response：

Iterate over each entry of the txt list and extract its txt property:

txt = [element.text for element in txt] # ['Ayden Sng']

^Repl.it

CodePudding user response：

from bs4 import BeautifulSoup 
import requests

url = 'https://www.mediacorp.sg/en/your-mediacorp/our-artistes/tca/male-artistes/ayden-sng-12357686'

artiste_name = 'celeb-name'

page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')

txt = soup.find_all('h1', attrs={'class':artiste_name})

print(txt[0].text)

if there are more than one reuslt you can use this code:

from bs4 import BeautifulSoup 
import requests

url = 'https://www.mediacorp.sg/en/your-mediacorp/our-artistes/tca/male-artistes/ayden-sng-12357686'

artiste_name = 'celeb-name'

page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')

txt = soup.find_all('h1', attrs={'class':artiste_name})
for i in txt:
  print(i.text)

CodePudding user response：

As 'h1', attrs={'class':artiste_name} selects only single text node that's why You should use find method.

from bs4 import BeautifulSoup
import requests

url = 'https://www.mediacorp.sg/en/your-mediacorp/our-artistes/tca/male-artistes/ayden-sng-12357686'

artiste_name = 'celeb-name'

page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')
txt = soup.find('h1', attrs={'class':artiste_name})
print(txt.text)

Output:

Ayden Sng