Home > database >  Cant find video downloaded with selenium on Heroku - Django
Cant find video downloaded with selenium on Heroku - Django

Time:08-18

I have the following application I'm working on but I have an error that I can't figure out what I should do.

The application is as follows the user adds a link to a video, the task is sent to celery via redis, celery downloads that video and saves it to media/AppVimeoApp/videos/.mp4, then displays it on a page with src="media\AppVimeoApp\videos....mp4".

This app works fine locally, on heroku however the task shows me as successful but the video is nowhere to be found, I just need it temporarily.

tasks.py

@shared_task
def download_video():
    chrome_options = webdriver.ChromeOptions()
    chrome_options.binary_location = str(os.getenv('GOOGLE_CHROME_BIN'))
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--start-maximized")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-extensions")
    chrome_options.add_argument('--disable-dev-shm-usage')    
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument('--disable-software-rasterizer')
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 640 XL LTE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.10166")
    chrome_options.add_argument("--disable-notifications")
    chrome_options.add_argument('--window-size=1920,1080')

    chrome_options.add_experimental_option("prefs", {
        "download.default_directory": f"{settings.MEDIA_ROOT}\\AppVimeoApp\\videos",
        "download.prompt_for_download": False,
        "download.directory_upgrade": True,
        "safebrowsing_for_trusted_sources_enabled": False,
        "safebrowsing.enabled": False
        }
    )

    driver = webdriver.Chrome(executable_path=str(os.getenv('CHROMEDRIVER_PATH')), chrome_options=chrome_options)

    driver.get('videolink')

    time.sleep(5)

    while True:
        try:
            driver.find_element(by=By.XPATH, value='/html/body/div[2]/div[1]/div[6]/div[2]/div[1]/a').click()
            break
        except:
            time.sleep(1)
            continue

    time.sleep(5)

    return ("Downloaded")

I am very grateful if you can help me. Thank you.

CodePudding user response:

celery downloads that video and saves it

That won't work on Heroku.

Each dyno has its own isolated ephemeral filesystem. Whatever process is running Celery (often a worker process) will save the file to its own filesystem, which is different from the filesystem seen by your web dynos.

Furthermore, all of these filesystems lose changes whenever their dyno restarts. This happens unpredictably and frequently—at least once per day. So saving files this way isn't a great solution on Heroku even from a single dyno.

A better solution would be to save the video to a third-party object store like Amazon S3 or Azure Blob Storage. Heroku has documentation for using S3 this way. Since you want to process the file you'll need to do what Heroku calls a pass-through upload:

In a pass-through upload, a file uploads to your app, which in turn uploads it to S3. This method enables you to perform preprocessing on user uploads before you push them to S3.

You'll have to do the image overlay on the same worker where you download the file due to the aforementioned filesystem isolation.

  • Related