People have asked similar questions like this 100 times before but none of the solutions are working to fix my issue! I have created a html doc that I am hosting off github that has a table on it! The table is going to be used to store a Players Username, Password and UserID. The webscraping is going good on a whole other project I am working on but it is not working here!
Down below I have put the Python script that is webscraping the website I have created!
from bs4 import BeautifulSoup
import requests
def getData():
url = 'https://galaxy-indie-studio.github.io/Galaxy-Indie-Studio-Website/database.html'
html_url = requests.get(url).text
soup = BeautifulSoup(html_url, "lxml")
database = soup.find_all('tr', class_="Player")
for Players in database:
username = Players.find('td', class_="Username").text
password = Players.find('td', class_="Password").text
userID = Players.find('td', class_="UserID").text
print(f"Username in database {username}")
print(f"Password in database {password}")
print(f"UserID in database {userID}")
If I leave the .text on the end of any of the variables I recieve AttributeError: 'NoneType' object has no attribute. If I remove the .text it returns as None, It's the same for username, password and userID
Down below I have put in the code that I have used so far for the website that i am using to store the table on!
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Player Database</title>
</head>
<body>
<table border = 4px, bgcolor="black", width= 100%>
<tr>
<th width="150" height="20" bgcolor="lightgray">Username</th>
<th width="150" height="20" bgcolor="lightgray">Password</th>
<th width="150" height="20" bgcolor="lightgray">UserID</th>
</tr>
<tr , width="150" height="20">
<td align="center" bgcolor="lightgray",>BigTall12</td>
<td align = "center" bgcolor="lightgray",></td>
<td align="center" bgcolor="lightgray",></td>
</tr>
<tr ,width="150" height="20">
<td align="center" bgcolor="lightgray",></td>
<td align = "center" bgcolor="lightgray",></td>
<td align = "center" bgcolor="lightgray",></td>
</tr>
<tr ,width="150" height="20">
<td align="center" bgcolor="lightgray",></td>
<td align="center" bgcolor="lightgray",></td>
<td align="center"bgcolor="lightgray",></td>
</tr>
<tr ,width="150" height="20">
<td align="center" bgcolor="lightgray",></td>
<td align="center" bgcolor="lightgray",></td>
<td align="center" bgcolor="lightgray",></td>
</tr>
</table>
</body>
</html>
CodePudding user response:
That's because you don't have a class
attribute in the <td>
tags. You do have a ,class
attribute though, and bs4 won't recognise that.
So what I'm saying is, your html is wrong. Get rid of those commas before the class attributes in your source html.
For example:
`<td align="center" bgcolor="lightgray",>BigTall12</td>`
should be
`<td align="center" bgcolor="lightgray" >BigTall12</td>`
Or, fix it once you read in the html:
import requests
from bs4 import BeautifulSoup
def getData():
url = 'https://galaxy-indie-studio.github.io/Galaxy-Indie-Studio-Website/database.html'
html_url = requests.get(url).text
html_url = html_url.replace(',class', ' class')
soup = BeautifulSoup(html_url, "lxml")
database = soup.find_all('tr', class_="Player")
for Players in database:
username = Players.find('td', class_="Username").text
password = Players.find('td', class_="Password").text
userID = Players.find('td', class_="UserID").text
print(f"Username in database {username}")
print(f"Password in database {password}")
print(f"UserID in database {userID}")
getData()