I am currently having problems understanding following error:
AttributeError: 'NoneType' object has no attribute 'find_all'
It is referring to line 21 of the following code:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.dwd.de/DE/wetter/wetterundklima_vorort/hessen/offenbach/_node.html'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
soup
# obtain information from html tag <table>
table = soup.find('table', id='wetklitab')
table
# obtain information from html tag <tr>
headers = []
for i in table.find_all('tr'):
title = i.text
headers.append(title)
print(f"{title}")
which is for i in table.find_all('tr'):
can somebody explain the error and how to solve it? thank you.
CodePudding user response:
Your error comes from the fact that table
is None
after the soup.find
line. You can confirm that by testing table is None
, which will give you True
. The reason is that the table you are looking for actually does not have an id. Instead, it is under a div
tag that has such an id.
Here is a quick fix of your code.
import requests
from bs4 import BeautifulSoup
url = 'https://www.dwd.de/DE/wetter/wetterundklima_vorort/hessen/offenbach/_node.html'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
soup
# obtain information from html tag <table>
div_ele = soup.find('div', id='wetklitab')
table = div_ele.find('table')
table
# obtain information from html tag <tr>
headers = []
for i in table.find_all('tr'):
title = i.text
headers.append(title)
print(f"{title}")
CodePudding user response:
The following code is producing ResultSet.To inject user-agent as headers parameter is mandatory.
import pandas as pd
import requests
from bs4 import BeautifulSoup
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
url = 'https://www.dwd.de/DE/wetter/wetterundklima_vorort/hessen/offenbach/_node.html'
page = requests.get(url,headers=headers)
soup = BeautifulSoup(page.text, 'lxml')
table = soup.select('#wetklitab')[1]
for i in table.find_all('tr')[1:]:
title = i.select_one('td').get_text()
print(title)
Output:
Donnerstag früh
Donnerstag mittags
Donnerstag spät
Donnerstag nachts
Freitag früh
Freitag spät
Samstag früh
Samstag spät
Sonntag früh
Sonntag spät