Home > Software design >  Scraping table with BeautifulSoup AttributeError: ResultSet object has no attribute 'find_all&#
Scraping table with BeautifulSoup AttributeError: ResultSet object has no attribute 'find_all&#

Time:06-13

I am new to web scraping and attempting to scrape the FDA device database (I am aware there is an excel export button and an API but neither fit my use case). The goal is to parse the table and get the device information behind each link as well as the other column information in the table. Here is the site:

https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpcd/classification.cfm?start_search=1&submission_type_id=&devicename=&productcode=&deviceclass=&thirdparty=&panel=&regulationnumber=&implant_flag=&life_sustain_support_flag=&summary_malfunction_reporting=&sortcolumn=deviceclassdesc&pagenum=10

The information I need is in the table tag and then the body tag. However when running a for loop to check I am getting the necessary information I get the error:

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

Following the suggestion gives the inverse error.

Here's my code (really short):

import requests
from bs4 import BeautifulSoup
        
url = 'https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpcd/classification.cfm?start_search=1&submission_type_id=&devicename=&productcode=&deviceclass=&thirdparty=&panel=&regulationnumber=&implant_flag=&life_sustain_support_flag=&summary_malfunction_reporting=&sortcolumn=deviceclassdesc&pagenum=10'
    
r = requests.get(url)
    
soup = BeautifulSoup(r.text, 'html.parser')
    
print(soup)
    
product_table = soup.find_all('table')
    
print(product_table)
    
for product in product_table.find_all('tbody'):
    rows = product.find_all('tr')
    for row in rows:
        pc_product = row.find('td', align_ ='left').text
        print(pc_product)

CodePudding user response:

Each data point under device information also contains a table. So you can use CSS selector to pull the right data.

import requests
from bs4 import BeautifulSoup
        
url = 'https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpcd/classification.cfm?start_search=1&submission_type_id=&devicename=&productcode=&deviceclass=&thirdparty=&panel=&regulationnumber=&implant_flag=&life_sustain_support_flag=&summary_malfunction_reporting=&sortcolumn=deviceclassdesc&pagenum=10'
    
r = requests.get(url)
    
soup = BeautifulSoup(r.text, 'html.parser')
   
product_tables = soup.select('table tbody tr td a')[11:]
for product_table in product_tables:
    print(product_table.get_text(strip=True))
    

Output:

device, cpr assist
heart valve, more than minimally manipulated allograft
cleanser, root canal
saliva, artificial
locator, root apex
device, electrical dental anesthesia
mouthguard, prescription
cord, retraction
mouthguard, over-the-counter
mouthguard, migraine/tension headache
  • Related