AttributeError: 'NoneType' object has no attribute 'find

I am currently having problems understanding following error:

AttributeError: 'NoneType' object has no attribute 'find_all'

It is referring to line 21 of the following code:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.dwd.de/DE/wetter/wetterundklima_vorort/hessen/offenbach/_node.html'

page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')
soup

# obtain information from html tag <table>

table = soup.find('table', id='wetklitab')
table

# obtain information from html tag <tr>

headers = []
for i in table.find_all('tr'):
    title = i.text
    headers.append(title)
    print(f"{title}")

which is for i in table.find_all('tr'): can somebody explain the error and how to solve it? thank you.

CodePudding user response：

Your error comes from the fact that table is None after the soup.find line. You can confirm that by testing table is None, which will give you True. The reason is that the table you are looking for actually does not have an id. Instead, it is under a div tag that has such an id.

Here is a quick fix of your code.

import requests
from bs4 import BeautifulSoup

url = 'https://www.dwd.de/DE/wetter/wetterundklima_vorort/hessen/offenbach/_node.html'

page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')
soup

# obtain information from html tag <table>
div_ele = soup.find('div', id='wetklitab')
table = div_ele.find('table')
table

# obtain information from html tag <tr>

headers = []
for i in table.find_all('tr'):
    title = i.text
    headers.append(title)
    print(f"{title}")

CodePudding user response：

The following code is producing ResultSet.To inject user-agent as headers parameter is mandatory.

import pandas as pd
import requests
from bs4 import BeautifulSoup


headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
url = 'https://www.dwd.de/DE/wetter/wetterundklima_vorort/hessen/offenbach/_node.html'

page = requests.get(url,headers=headers)


soup = BeautifulSoup(page.text, 'lxml')


table = soup.select('#wetklitab')[1]
for i in table.find_all('tr')[1:]:
    title = i.select_one('td').get_text()
    print(title)

Output:

Donnerstag früh
Donnerstag mittags
Donnerstag spät
Donnerstag nachts
Freitag früh
Freitag spät
Samstag früh
Samstag spät
Sonntag früh
Sonntag spät