Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image comes with 8 Critical and 34 High Vulnerabilities #506

Open
andora2 opened this issue May 18, 2024 · 3 comments
Open

Docker image comes with 8 Critical and 34 High Vulnerabilities #506

andora2 opened this issue May 18, 2024 · 3 comments

Comments

@andora2
Copy link

andora2 commented May 18, 2024

Hi,

please find bellow a less vulnerable docker setup as a improvement suggestion.
It reduces theproblem from this [8C, 34H, 32M, 98L Issues]:
..> docker scout quickview
image
TO this [-C, 1H, 3M, 0L Issues]:
image

The main solution is to use alpine instead of debian::bullseye. (bookworm removed the criticals but had still quite some High vuln. issues).
Using alpine required to help playwright and pymupdf to pip install successfully, but finaly it worked out.

The app works like a charm.

Though I think the Dockerfile image layer concept might profit from some improvement as well.

Please checkout yourself, and update the dockerfile and requirements.txt for the sake of less vulnerable instances out there :o)
Reg. requirements.txt: you just have to exclude playwright and pymupdf since the pip install is done in the docker (not necessary a final requirement, but was good enough for me)

Here the DOCKERFILE:

FROM python:3.11-alpine as install-browser

# Install required packages
RUN apk update && apk add --no-cache \
    chromium \
    chromium-chromedriver \
    firefox-esr \
    nodejs \
    npm \
    wget \
    tar \
    bash \
    build-base \
    libffi-dev \
    gcc \
    g++ \
    make \
    libc-dev \
    linux-headers \
    libxml2-dev \
    libxslt-dev \
    rust \
    cargo \
    openssl-dev \
    jpeg-dev \
    zlib-dev \
    freetype-dev \
    lcms2-dev \
    openjpeg-dev \
    tiff-dev \
    tk-dev \
    tcl-dev \
    harfbuzz-dev \
    fribidi-dev \
    libjpeg-turbo-dev \
    cairo-dev \
    pango-dev \
    giflib-dev \
    poppler-utils \
    poppler-dev \
    tesseract-ocr \
    leptonica-dev \
    musl-dev

    
# Check versions
RUN chromium-browser --version && chromedriver --version

# Install Geckodriver
RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.33.0/geckodriver-v0.33.0-linux64.tar.gz \
    && tar -xvzf geckodriver-v0.33.0-linux64.tar.gz \
    && chmod +x geckodriver \
    && mv geckodriver /usr/local/bin/

# Set Env. vars, to ignore Root-Warning
ENV PIP_ROOT_USER_ACTION=ignore

# Set environment variables for Playwright
# (spad.uk) https://www.spad.uk/posts/making-playwright-work-on-alpine-out-of-spite/
# running Playwright on Alpine Linux is the compatibility issue with the musl libc library. Playwright and its dependencies are primarily built for the glibc library, which is not available on Alpine Linux.
# https://stackoverflow.com/questions/75581790/how-to-get-playwright-browser-tests-running-on-alpine-docker-container
# One approach to running Playwright on Alpine is to install Node.js and Chromium from the Alpine repositories and configure Playwright to use these installations instead of its own drivers. 
# Using Node.js and Chromium from Alpine Repositories
ENV PLAYWRIGHTBROWSERSPATH=/usr/lib/chromium/
ENV PLAYWRIGHTSKIPBROWSER_DOWNLOAD=1

# Create APP dir
RUN mkdir /usr/src/app
WORKDIR /usr/src/app

# Copy and install Python-Dep.
COPY ./requirements.txt ./requirements.txt
RUN pip install --no-cache-dir -r requirements.txt


# Install Playwright without Browser
RUN npm install -g playwright

# Set Playwright Env. Vars
ENV PLAYWRIGHT_BROWSERS_PATH=/usr/bin
ENV PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/bin/chromium-browser
ENV PLAYWRIGHT_FIREFOX_EXECUTABLE_PATH=/usr/local/bin/firefox
ENV PLAYWRIGHT_WEBKIT_EXECUTABLE_PATH=/usr/bin/webkit

# Install PyMuPDF
RUN pip install --no-cache-dir pymupdf

# Change to unprivileged user
RUN adduser -D -s /bin/bash gpt-researcher \
    && chown -R gpt-researcher:gpt-researcher /usr/src/app

USER gpt-researcher

# Copy rest of the code
COPY --chown=gpt-researcher:gpt-researcher ./ ./

# Expose Port 8000
EXPOSE 8000

# Start the APP
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
@assafelovic
Copy link
Owner

@ElishaKay

@ElishaKay
Copy link
Contributor

ElishaKay commented May 18, 2024

@andora2 what machine are you using?

I tried out this Dockerfile on Mac on the Master Branch and it crashed with the error below.
Also, is this Dockerfile you propose a lighter or heavier image?
Feel free to create a PR with the proposed changes - (seems like you want to add some stuff to requirements.txt as well) and we'll take it from there

3.908 ERROR: Could not find a version that satisfies the requirement playwright (from versions: none)
3.908 ERROR: No matching distribution found for playwright
------
failed to solve: process "/bin/sh -c pip install --no-cache-dir -r requirements.txt" did not complete successfully: exit code: 1

@andora2
Copy link
Author

andora2 commented May 18, 2024

Hi,

machine is: windows 10
The error I see is because playwright is still in the requirements.txt, and that has to fail.
Alpine forces us to deal with playwright and pymupdf separately => in the Dockerfile itself. (I did mention that in my suggestion)
So no, I didn't have to add anything to the requirements.txt rather comment out playwright and pymupdf (please check the docker delta and my suggestion again, it is mentioned there).

I would have PR this, but it needs some cleancode beautifying steps and unfortunately I'll not make it any time soon (if at all). I had to solve this issue for a dedicated topic but nothing more then that.I thought I could at least let you know.

Take care,
Adrian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants