I want to design a pipeline for executing a program that can have multiple configurations by argument. Developer is not interested to have each argument as a variable and they want to have the option to be able to add multiple variables by using pipeline. we are using bash and our development using gitlab-ci and we are using octopus for uat env deployment.
example:
spark2-submit \
--master $MASTER \
--name $NAME \
--queue $QUEUE \
--conf spark.shuffle.service.enabled=true \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.executorIdleTimeout=12
As you can see in the above example, I want to have flexibility in adding more "--conf" parameters.
should I have a dummy parameter and then add it to the end of this command?
for example:
spark2-submit \
--master $MASTER \
--name $NAME \
--queue $QUEUE \
--conf spark.shuffle.service.enabled=true \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.executorIdleTimeout=12 \
$additional_param
I am using Gitlab for my code repo and Octopus for my CICD. I am using bash for deployment. I am looking for a flexible option that I can use the full feature of the Octopus variable option and gitlab. what is your recommendation? do you have a better suggestion?
CodePudding user response:
This is what Charles is hinting at with "Lists of arguments should be stored in arrays":
spark2_opts=(
--master "$MASTER"
--name "$NAME"
--queue "$QUEUE"
--conf spark.shuffle.service.enabled=true
--conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.executorIdleTimeout=12
)
# add additional options, through some mechanism such as the environment:
if [[ -n "$SOME_ENV_VAR" ]]; then
spark2_opts =( --conf "$SOME_ENV_VAR" )
fi
# and execute
spark2-submit "${spark2_opts[@]}"
bash array definitions can contain arbitrary whitespace, including newlines, so format for readability