My goal: Attempting to build a function; def retrieve_title(html)
that expects as input, a string of html and returns the title element.
I've imported beautifulsoup to complete this task. Any guidance is appreciated as I'm still learning.
My attempted function:
def retrieve_title(html):
soup = [html]
result = soup.title.text
return(result)
Using the function:
html = '<title>Jack and the bean stalk</title><header>This is a story about x y z</header><p>talk to you later</p>'
print(get_title(html))
Unexpected outcome:
"AttributeError: 'list' object has no attribute 'title'"
Expected outcome:
"Jack and the beanstalk"
CodePudding user response:
Jack and the bean stalk
is a text node immediate after title tag
so to grab that you can apply .find(text=True)
html = '''
<title>
Jack and the beanstalk
</title>
<header>
This is a story about x y z
</header>
<p>
Once upon a time
</p>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,'html.parser')
#print(soup.prettify())
title=soup.title.find(text=True)
print(title)
Output:
Jack and the beanstalk
CodePudding user response:
You have to call the function
print(retrieve_title(html))