Home > Mobile >  Scraping GIF url from Websites
Scraping GIF url from Websites

Time:12-27

I am very new to web scraping and trying to scrape gif urls from a website. For example, from gifer.com, search gifs for "smile" and then download urls for all gifs listed. Below is an example of the source from which I want to extract src element for the video (https://i.gifer.com/ON0.mp4 in this case).

<div >
  <div >
    <div >
      <span  style="color: rgb(255, 255, 255); font-size: 44px;"></span>
    </div>
    <div  style="width: 367.462px;">
      <div style="padding-top: 122.462%;">
        <div >
          <div  style="width: 367.462px;">
            <div>
              <video poster="https://i.gifer.com/fetch/w300-preview/d0/d0e6e89a42c43d31b5913e232d87af7b.gif"  loop="" autoplay="" playsinline="">
                <source src="https://i.gifer.com/ON0.mp4" type="video/mp4">
              </video>
            </div>
          </div>
        </div>
      </div>
    </div>
    <div >
      <span  style="color: rgb(255, 255, 255); font-size: 44px;">
      </span>
    </div>
  </div>
</div>

There are more than thousands of such results and I was advised to use Python and Selenium. However my knowledge of Selenium and Python is limited I tried below but I am not able to make much headway.


from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

driver.get("https://gifer.com/en/gifs/smile")
imgResults = driver.find_elements(By.CLASS_NAME, "media-container2")

print(len(imgResults))
#print(driver.page_source)
for i in range(0,len(imgResults)):
    print(imgResults[i])

driver.quit()

Above returns 4 elements-

<selenium.webdriver.remote.webelement.WebElement (session="fac424650675a90b2a8dee91efdc01f4", element="16e771ca-37d8-45a0-8200-0f03da0b7d14")> <selenium.webdriver.remote.webelement.WebElement (session="fac424650675a90b2a8dee91efdc01f4", element="8c9abdcb-bc9d-47da-9958-109e722b3ae9")> <selenium.webdriver.remote.webelement.WebElement (session="fac424650675a90b2a8dee91efdc01f4", element="d9640144-4ba1-414b-aa4f-5141387335ef")> <selenium.webdriver.remote.webelement.WebElement (session="fac424650675a90b2a8dee91efdc01f4", element="9626db84-1da9-42ad-b314-56222a5e933b")>

Now, how do I grab the source src link for each video element is what I am not getting.

CodePudding user response:

I was wrong, no need to load a new page to get the mp4 link:

for img in driver.find_elements(By.CSS_SELECTOR, "figure a"):
    code = img.get_attribute('href').split('/')[-1]
    link = f'https://i.gifer.com/{code}.mp4'
    print(link)

output

https://i.gifer.com/fzvh.mp4
https://i.gifer.com/7F5y.mp4
https://i.gifer.com/6qOR.mp4
https://i.gifer.com/3JT.mp4
...

You can obtain the list of links in one line

links = [f"https://i.gifer.com/{img.get_attribute('href').split('/')[-1]}.mp4" for img in driver.find_elements(By.CSS_SELECTOR, "figure a")]
  • Related