Home > Software design >  Scraping and writing the table into dataframe shows me TypeError
Scraping and writing the table into dataframe shows me TypeError

Time:11-30

I am trying to scraping the table and write in a dataframe they show me a typeerror. How to resolve these errors?

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium import webdriver
import pandas as pd
temp=[]
driver= webdriver.Chrome('C:\Program Files (x86)\chromedriver.exe')
driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Inline Frame Example']")))
headers=WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='sites']//thead"))).text
rows=WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='sites']//tbody"))).text
temp.append(rows)
df = pd.DataFrame(temp,columns=headers)
print(df)

In headers I pass the data FAMI-QS Number ... Expiry date while in rows I will pass the FAM-0694 ... 2022-09-04

CodePudding user response:

To scrap the FAMI QS Number and Site Name column you need to create a list of the desired texts using List Comprehension inducing WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Code Block:

    driver = webdriver.Chrome(service=s, options=options)
    driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
    FAMI_QS_Numbers = []
    Site_Names = []
    WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Inline Frame Example']")))
    FAMI_QS_Numbers.extend([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='sites']//tbody//tr/descendant::td[1]")))])
    Site_Names.extend([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='sites']//tbody//tr//td/p")))])
    df = pd.DataFrame(data=list(zip(FAMI_QS_Numbers, Site_Names)), columns=['FAMI QS Number', 'Site Name'])
    print(df)
    driver.quit()
    
  • Console Output:

      FAMI QS Number                             Site Name
    0       FAM-1293                    AmTech Ingredients
    1       FAM-0841                    3F FEED & FOOD S L
    2       FAM-1361                5N Plus Additives GmbH
    3    FAM-1301-01                   A & V Corp. Limited
    4       FAM-1146  A.   E. Fischer-Chemie GmbH & Co. KG
    5       FAM-1589          A.M FOOD CHEMICAL CO LIMITED
    6    FAM-0613-01                          A.W.P. S.r.l
    7       FAM-0867             AB AGRI POLSKA Sp. z o.o.
    8    FAM-1510-02                              AB Vista
    9    FAM-1510-01                            AB Vista *
    

CodePudding user response:

You can get all table data from api calls html response using only pandas as follows:

Code:

import requests
import pandas as pd

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'}

url = "https://famiqs.viasyst.net/certified-sites"

req = requests.get(url,headers=headers)

table = pd.read_html(req.text)

df = table[0]#.to_csv('info.csv',index = False)

print(df)

Output:

    FAMI-QS Number  ... Expiry date
0          FAM-0694  ...  2022-09-04
1          FAM-1491  ...  2022-10-17
2     FAM-ISFSF-003  ...  2022-10-27
3          FAM-1533  ...  2022-10-31
4          FAM-1090  ...  2022-11-13
...             ...  ...         ...
1472    FAM-1761-01  ...  2024-10-27
1473       FAM-1796  ...  2024-09-29
1474    FAM-1427-01  ...  2023-12-01
1475       FAM-1861  ...  2024-11-22
1476    FAM-0005-07  ...  2024-11-25

[1477 rows x 7 columns]
  • Related