Why is my webs scraping function returning something unexpected?-CodePudding

My goal: Attempting to build a function; def retrieve_title(html) that expects as input, a string of html and returns the title element.

I've imported beautifulsoup to complete this task. Any guidance is appreciated as I'm still learning.

My attempted function:

def retrieve_title(html):
    soup = [html]
    result = soup.title.text
    return(result)

Using the function:

html = '<title>Jack and the bean stalk</title><header>This is a story about x y z</header><p>talk to you later</p>'
print(get_title(html))

Unexpected outcome:

"AttributeError: 'list' object has no attribute 'title'"

Expected outcome:

"Jack and the beanstalk"

CodePudding user response：

Jack and the bean stalk is a text node immediate after title tag so to grab that you can apply .find(text=True)

 html = '''
    <title>
     Jack and the beanstalk     
    </title>
    <header>
     This is a story about x y z
    </header>
    <p>
     Once upon a time
    </p>
    '''
    
    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(html,'html.parser')
    
    #print(soup.prettify())
    
    title=soup.title.find(text=True)
    print(title)

Output:

 Jack and the beanstalk

CodePudding user response：

You have to call the function

print(retrieve_title(html))