I want to make a recommendation system for webtoon, so I am collecting webtoon data. Currently, I wrote a code to scrap the url of the toons on the Kakao Webtoon page.
def extract_from_page(page_link):
links = []
driver = webdriver.Chrome()
driver.get(page_link)
elems = driver.find_elements_by_css_selector(".h-full.relative")
for elem in elems:
link = elem.get_attribute('href')
if link:
links.append({'id': int(link.split('/')[-1]), 'link': link})
print(len(links))
return links
This code works in weekly page(https://webtoon.kakao.com/original-webtoon, https://webtoon.kakao.com/original-novel)
However, in page that shows finished toons(https://webtoon.kakao.com/original-webtoon?tab=complete), it only receives 13 urls for the 13 webtoons at the top of the page.
I found similar post(web scraping gives only first 4 elements on a page) and add scroll, but noting changed.
I would appreciate it if you could tell me the cause and solution.
CodePudding user response:
Try like below.
driver.get("https://webtoon.kakao.com/original-webtoon?tab=complete")
wait = WebDriverWait(driver,30)
j = 1
for i in range(5):
# Wait for the elements to load/appear
wait.until(EC.presence_of_all_elements_located((By.XPATH, "//a[contains(@href,'content')]")))
# Get all the elements which contains href value
links = driver.find_elements(By.XPATH,"//a[contains(@href,'content')]")
# Iterate to print the links
for link in links:
print(f"{j} : {link.get_attribute('href')}")
j = 1
# Scroll to the last element of the list links
driver.execute_script("arguments[0].scrollIntoView(true);",links[len(links)-1])
Output:
1 : https://webtoon.kakao.com/content/밤의-향/1532
2 : https://webtoon.kakao.com/content/브레이커2/596
3 : https://webtoon.kakao.com/content/토이-콤플렉스/1683
...