I am new to web scraping and attempting to scrape the FDA device database (I am aware there is an excel export button and an API but neither fit my use case). The goal is to parse the table and get the device information behind each link as well as the other column information in the table. Here is the site:
The information I need is in the table
tag and then the body
tag. However when running a for loop to check I am getting the necessary information I get the error:
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Following the suggestion gives the inverse error.
Here's my code (really short):
import requests
from bs4 import BeautifulSoup
url = 'https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpcd/classification.cfm?start_search=1&submission_type_id=&devicename=&productcode=&deviceclass=&thirdparty=&panel=®ulationnumber=&implant_flag=&life_sustain_support_flag=&summary_malfunction_reporting=&sortcolumn=deviceclassdesc&pagenum=10'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup)
product_table = soup.find_all('table')
print(product_table)
for product in product_table.find_all('tbody'):
rows = product.find_all('tr')
for row in rows:
pc_product = row.find('td', align_ ='left').text
print(pc_product)
CodePudding user response:
Each data point under device information also contains a table. So you can use CSS
selector to pull the right data.
import requests
from bs4 import BeautifulSoup
url = 'https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpcd/classification.cfm?start_search=1&submission_type_id=&devicename=&productcode=&deviceclass=&thirdparty=&panel=®ulationnumber=&implant_flag=&life_sustain_support_flag=&summary_malfunction_reporting=&sortcolumn=deviceclassdesc&pagenum=10'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
product_tables = soup.select('table tbody tr td a')[11:]
for product_table in product_tables:
print(product_table.get_text(strip=True))
Output:
device, cpr assist
heart valve, more than minimally manipulated allograft
cleanser, root canal
saliva, artificial
locator, root apex
device, electrical dental anesthesia
mouthguard, prescription
cord, retraction
mouthguard, over-the-counter
mouthguard, migraine/tension headache