Why did a plawright-python app run in Docker failed? Headless=False?-CodePudding

I have a small application that uses fast-api and playwright to scrape data and send it back to the client. The program is working properly when I'm running it locally, but when I try to run it as a Docker image it fails with the following error:

Looks like you launched a headed browser without having a XServer running.
Set either 'headless: true' or use 'xvfb-run <your-playwright-app>' before running Playwright.

obviously I tried running it in Headless=True mode, but the code fails with this error:

net::ERR_EMPTY_RESPONSE at https://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true
logs
navigating to \"https://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true\", 
waiting until \"load\"

I also tried to run it locally with Headless=True and it failed with "Timeout 30000ms exceeded" error.

This is the funcion I'm using to return the page html:

    def extract_html(self):
        with sync_playwright() as p:
            browser = p.chromium.launch()
            page = browser.new_page()
            page.goto('https://book.flygofirst.com/Flight/Select?inl={}&CHD={}&s=True&o1={}&d1={}&ADT={}&dd1={}&gl=0&glo=0&cc=INR&mon=true'.format(self.infants,  self.children , self.origin,  self.destination,  self.adults, self.date))
            html = page.inner_html('#sectionBody')
            return html

and this is my Dockerfile:

FROM python:3.9-slim

COPY ../../requirements/dev.txt ./

RUN python3 -m ensurepip
RUN pip install -r dev.txt
RUN playwright install 
RUN playwright install-deps 

ENV PYTHONPATH "${PYTHONPATH}:/app/"
WORKDIR /code/src

COPY ./src /app

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]

Hope someone could figure out what I'm doing wrong.

CodePudding user response：

After investigating and trying several things, looks like the problem is the user_agent of the browser when is in headless mode, for some reason the default user agent does not like to that page, try with:

def extract_html(self):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36')
        page.goto('http://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true')
        html = page.inner_html('#sectionBody')
        return html