I have 5 tasks in my project that need to be run periodically. Some of these tasks are run on a daily basis, some on a weekly basis.
I try to containerize each task in a Docker image. Here is one illustrative example:
FROM tensorflow/tensorflow:2.7.0
RUN mkdir /home/MyProject
COPY . /home/MyProject
WORKDIR /home/MyProject/M1/src/
RUN pip install pandas numpy
CMD ./task1.sh
There are a list of Python scripts that need to be run in the task1.sh file defined above. This is not a server application or anything similar, it will run the task1.sh, which will run all the python scripts defined in it one by one, and the entire process will be finished within minutes. And the same process is supposed to be repeated 24 hours later.
How can I schedule such Docker containers in GCP? Are there different ways of doing it? Which one is comparably simpler if there are multiple solutions?
I am not a dev-ops expert by any means. All examples in documentation I find are explained for server applications which are running all the time, not like my example where the image needs to be run just once periodically. This topic is quite daunting for a beginner in this domain like myself.
ADDENDUM:
Looking at Google's documentation for cronjobs in GKE on the following page: https://cloud.google.com/kubernetes-engine/docs/how-to/cronjobs
I find the following cronjob.yaml file:
# cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Allow
startingDeadlineSeconds: 100
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo "Hello, World!"
restartPolicy: OnFailure
It is stated that this cronjob prints the current time and a string once every minute
.
But it is documented in a way with the assumption that you deeply understand what is going on on the page, in which case you would not need to read the documentation!
Let's say that I have my image that I would like it to be run once every day, and the name of my image - say - is my_image
.
I assume that I am supposed to change the following part for my own image.
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo "Hello, World!"
It is a total mystery what these names and arguments mean.
name: hello
I suppose it is just a user selected name and does not have any practical importance.
image: busybox
Is this busybox
the base image? If not, what is that? It says NOTHING about what this busybox thing is and where it comes from!
args:
- /bin/sh
- -c
- date; echo "Hello, World!"
And based on the explanation on the page, this is the part that prints the date and the "Hello, World!"
string to the screen.
Ok... So, how do I modify this template to create a cronjob out of my own image my_image
? This documentation does not help at all!
CodePudding user response:
I agree with @guillaume blaquiere. Also, Autopilot GKE is designed to reduce the operational cost of managing clusters, optimize your clusters for production, and yield higher workload availability. The mode of operation refers to the level of flexibility, responsibility, and control that you have over your cluster. In addition to the benefits of a fully managed control plane and node automations, GKE offers two modes of operation:
- Autopilot: GKE provisions and manages the cluster's underlying infrastructure, including nodes and node pools, giving you an optimized cluster with a hands-off experience.
- Standard: You manage the cluster's underlying infrastructure, giving you node configuration flexibility.
I hope thai might help you with the Autopilot Overview.
CodePudding user response:
I will answer your comment here, because the second part of your question is too long to answer.
Don't be afraid, it's kubernetes API definition. You declare what you want to the control plane. It is in charge to make your whishes happen!
# cronjob.yaml
apiVersion: batch/v1 # The API that you call
kind: CronJob # The type of object/endpoint in that API
metadata:
name: hello # The name of your job definition
spec:
schedule: "*/1 * * * *" # Your scheduling, change it to "0 10 * * *" to run your job every dat at 10.00am
concurrencyPolicy: Allow # config stuff, deep dive later
startingDeadlineSeconds: 100 # config stuff, deep dive later
suspend: false # config stuff, deep dive later
successfulJobsHistoryLimit: 3 # config stuff, deep dive later
failedJobsHistoryLimit: 1 # config stuff, deep dive later
jobTemplate: # Your execution definition
spec:
template:
spec:
containers:
- name: hello # Custom name of your container. Only to help you in case of debug, logs, ...
image: busybox # Image of your container, can be gcr.io/projectID/myContainer for example
args: # Args to pass to your container. You also have the "entrypoint" definition to change if you want. The entrypoint is the binary to run and that will receive the args
- /bin/sh
- -c
- date; echo "Hello, World!"
# You can also use "command" to run the command with the args directly. In fact it's WHAT you start in your container to perform the job.
restartPolicy: OnFailure # Config in case of failure.
You have more details on the API definition here
Here the API definition of a container with all the possible values to customize it.