Having some difficulty finding out how to detect <a href> in Python-CodePudding

from bs4 import BeautifulSoup
import requests

page = requests.get('https://www.capitol.tn.gov/house/members/').text
soup = BeautifulSoup(page, 'html.parser')

table = soup.find('table')
rows = table.find_all('tr')
header = rows[0].find_all('th')
header_text = []

for item in header:
  header_text.append(item.get_text(strip=True))
  
# check header results
print(header_text)

# get rows
for row in rows:
  row_text = []
  a = row.find_all('a')
  td = row.find_all('td')
  for item in td:
    if item:
      row_text.append(item.get_text(strip=True))
    
  # check row results
  if len(row_text) > 0:
    print(row_text)

I'm sorry if this is a stupid question, but I'm having a bit of trouble coming up with how to get the 'a's or 'hrefs' (aka the emails) to actually appear as the first item in the row. For starters, I've tried the insert() method, but it never actually gives me anything.

CodePudding user response：

This does the job:

# get rows
for row in rows:
  row_text = []
  a = row.find_all('a')
  td = row.find_all('td')
  # print(td)
  for item in td:
    email = item.find("a", {"class": "email"})
    
    if email != None:
      email = email.get("href")
      row_text.append(email)

    if item:
      row_text.append(item.get_text(strip=True))
    
  # check row results
  if len(row_text) > 0:
    print(row_text)

The code basically checks if any element in a td tag has an a tag in it. If it finds an a tag, it checks if the tag belong so the class email. If it does then it gets the href from the tag and stores it inside a variable by the name email which is later appended to the row_text list.