I am trying to store my model artifacts using mlflow to s3. In the API services, we use MLFLOW_S3_ENDPOINT_URL
as the s3 bucket. In the mlflow service, we pass it as an environment variable. But, the mlflow container servicer fails with the below exception:
mflow_server | botocore.exceptions.HTTPClientError: An HTTP Client raised an unhandled exception: Not supported URL scheme s3
docker-compose file as below:
version: "3.3"
services:
prisim-api:
image: prisim-api:latest
container_name: prisim-api
expose:
- "8000"
environment:
- S3_URL=s3://mlflow-automation-artifacts/
- MLFLOW_SERVER=http://mlflow:5000
- AWS_ID=xyz
- AWS_KEY=xyz
networks:
- prisim
depends_on:
- mlflow
links:
- mlflow
volumes:
- app_data:/usr/data
mlflow:
image: mlflow_server:latest
container_name: mflow_server
ports:
- "5000:5000"
environment:
- AWS_ACCESS_KEY_ID=xyz
- AWS_SECRET_ACCESS_KEY=xyz
- MLFLOW_S3_ENDPOINT_URL=s3://mlflow-automation-artifacts/
healthcheck:
test: ["CMD", "echo", "mlflow server is running"]
interval: 1m30s
timeout: 10s
retries: 3
networks:
- prisim
networks:
prisim:
volumes:
app_data:
Why the scheme s3 is not supported?
CodePudding user response:
I have 0 experience with miflow, though what I see in the documentation is that you are using the wrong environment variable to set the S3 bucket. Or to be more precises, there seems to be no environment variable for what you try to do.
MLFLOW_S3_ENDPOINT_URL
should be used in case you don't use AWS for S3 and is expecting a normal API url (starting with http/https). From the documentation:
To store artifacts in a custom endpoint, set the
MLFLOW_S3_ENDPOINT_URL
to your endpoint’s URL. For example, if you have a MinIO server at 1.2.3.4 on port 9000:
export MLFLOW_S3_ENDPOINT_URL=http://1.2.3.4:9000
I also came across a github repository that creates docker images for the project. And they do it like this:
#!/bin/sh
set -e
if [ -z "$FILE_DIR" ]; then
echo >&2 "FILE_DIR must be set"
exit 1
fi
if [ -z "$AWS_BUCKET" ]; then
echo >&2 "AWS_BUCKET must be set"
exit 1
fi
mkdir -p "$FILE_DIR" && mlflow server \
--backend-store-uri sqlite:///${FILE_DIR}/sqlite.db \
--default-artifact-root s3://${AWS_BUCKET}/artifacts \
--host 0.0.0.0 \
--port $PORT
Searching about this flag, and an environment variable for it brought me to the same documentation, and it does not list an environment variable for it.
Besides that, I also came to a code example that lets you set the S3 bucket in code. So you could also parse the environment variable in code and set it like this:
import mlflow
mlfow.set_tracking_uri("your_postgres_uri") # replace
expr_name = "new_experiment_2" # replace
s3_bucket = "your_s3_bucket_uri" # replace
mllfow.create_experiment(expr_name, s3_bucket)
mlflow.set_experiment(expr_name)
with mlflow.start_run():
# your code
CodePudding user response:
I found the solution.
I have added ["AWS_DEFAULT_REGION"]
to the environment variables and it worked.