Home > Back-end >  What is expected token for declare spark submit (python script) in shell script with specific direct
What is expected token for declare spark submit (python script) in shell script with specific direct



What is expected token for declare spark submit (python script) in shell script with specific directory input? What punctuation should I use?

I already try < , but it doesn't work


I try to give as much as detail possible on my case to understand my situation. My input is in

sys.argv[1] for dataset_1, dataset_2, dataset_3

sys.argv[2] for dataset_4

sys.argv[3] for dataset_5

sys.argv[4] for dataset_6

My Output


Additional input in

sys.argv[6] for year

sys.argv[7] for month

Here is part of the script, the file name is cs_preDeploy.py

import os
import sys

input_path_1 = os.path.join(sys.argv[1], 'dataset_1')

input_path_2 = os.path.join(sys.argv[1], 'dataset_2')

input_path_3 = os.path.join(sys.argv[1], 'dataset_3')

# /tmp/sdsid/encrypted_dataset/328b7446-1862-4489-b1b4-57fa55fe556a/0/dataset_4/year=2021/month=2
input_path_4 = os.path.join(sys.argv[2], 'dataset_4')

# /tmp/sdsid/encrypted_dataset/3119bdd9-c7a8-44c3-b3f8-e49a86261106/0/dataset_5/year=2021/month=2
input_path_5 = os.path.join(sys.argv[3], 'dataset_5')

# /tmp/sdsid/encrypted_dataset/efc84a0f-52e9-4dff-91a1-56e1d7aa02cb/0/dataset_6/year=2021/month=2
input_path_6 = os.path.join(sys.argv[4], 'dataset_6')

output_path = sys.argv[5]

#query_year = sys.argv[6]
#query_month = sys.argv[7]
#For looping year month

if len(sys.argv) > 7:
    year = int(sys.argv[6]) # year
    month = int(sys.argv[7]) # month
    month_obs = datetime.datetime.today()
    month = month_obs.month
    year = month_obs.year

Here's my first try

[sdsid@user algorithm]$ PYSPARK_PYTHON=/usr/bin/python3 ./bin/spark-submit \
> --master yarn \
> --deploy-mode cluster \
> --driver-memory 16g \
> --executor-memory 16g \
> --num-executors 5 \
> --executor-cores 1 \
> ./home/sdsid/algorithm/cs_preDeploy.py

The Output

-bash: ./bin/spark-submit: No such file or directory

Here's the second try, I put year parameter for sys.argv[6] is 2021 and sys.argv[7] is 7 (July)

[sdsid@user algorithm]$ nohup spark-sumbit cs_preDeploy.py </tmp/sdsid/sample_dataset/></tmp/sdsid/sample_dataset/dataset_4></tmp/sdsid/sample_dataset/dataset_5></tmp/sdsid/sample_dataset/dataset_6></tmp/sdsid/sample_output/dataset_output/> 2021 7

The error message

-bash: syntax error near unexpected token `<'

The Third Try

[sdsid@user algorithm]$ nohup spark-sumbit cs_preDeploy.py <"/tmp/sdsid/sample_dataset/"><"/tmp/sdsid/sample_dataset/dataset_4"><"/tmp/sdsid/sample_dataset/dataset_5"><"/tmp/sdsid/sample_dataset/dataset_6"><"/tmp/sdsid/sample_output/dataset_output/"> 2021 7

The error message

-bash: syntax error near unexpected token `<'

CodePudding user response:

  • -bash: ./bin/spark-submit: No such file or directory
    a. Put here full path like /folder1/folder2/bin/spark-submit as as ./ means current directory and depending where you are now such path may not exist.

    b.Or add spark submit to enter image description here "

  • Please reffer the link how to submit apps to run py spark, here a lot of examples

  • Related