Home > Back-end >  scraping name of dataset in kaggle using python
scraping name of dataset in kaggle using python

Time:02-10

Hi, Please how can i get the name of dataset in kaggle, usign beatiful soup or selenium or scrapy. I test this code but no return :

from bs4 import BeautifulSoup
import requests

url = 'https://www.kaggle.com/heptapod/titanic'
res = requests.get(url)
html_page = res.content

soup = BeautifulSoup(html_page, 'html.parser')
datasetName = soup.find('h5',{'class':'sc-dIvrsQ sc-hHEiqL sc-kaPsuu kSVYRu ccTnQh ffXPrd'})

print(datasetName)

see the picture : inspect element from kaggle

CodePudding user response:

Using Selenium

from selenium.webdriver.chrome.options import Options
opt = Options()
opt.add_argument('--headless')
driver = webdriver.Chrome(executable_path = 'yourdriverpath', options=opt)

driver.get("https://www.kaggle.com/heptapod/titanic")
time.sleep(5)
datasetname = driver.find_element(By.XPATH, "//div[@role='button']//div//div").text
print(datasetname)

Output:

train_and_test2.csv

Process finished with exit code 0

dataset snapshot

  • Related