Home > Blockchain >  Get the equivalent DataprocCreateBatchOperator operator for Azure airflow operator
Get the equivalent DataprocCreateBatchOperator operator for Azure airflow operator

Time:01-19

I have 3 operators imported from airflow.providers.google.cloud.operators.dataproc

  1. DataprocCreateBatchOperator
  2. DataprocDeleteBatchOperator
  3. DataprocGetBatchOperator

Need the same kind-of operators for Azure. Can please someone look into this or I have to create a new operator ?

CodePudding user response:

@Mazlum Tosun

For GCP in my code DataprocCreateBatchOperator used like this:-

create_batch = DataprocCreateBatchOperator(
    task_id="CREATE_BATCH",
    batch={
            "pyspark_batch": {
                "main_python_file_uri": f"gs://{ARTIFACT_BUCKET}/spark-jobs/main.py",
                "args": app_args,
                "python_file_uris": [
                    f"gs://{ARTIFACT_BUCKET}/spark-jobs/jobs.zip",
                    f"gs://{ARTIFACT_BUCKET}/spark-jobs/libs.zip"
                ],
                "jar_file_uris": test_jars,
                "file_uris": [
                    f"gs://{ARTIFACT_BUCKET}/config/params.yaml"
                ]
            },
            "environment_config": {
                "peripherals_config": {
                    "spark_history_server_config": {}
                }
            }
        },
        region=REGION,batch_id=batch_id_str,)

CodePudding user response:

I believe the apache-airflow-providers-microsoft-azure provider package equivalent for Dataproc operators would be Azure Synapse Operators.

Specifically, the AzureSynapseRunSparkBatchOperator allows users to "execute a spark application within Synapse Analytics".

If you're running Spark jobs on Azure Databricks, there are also several Databricks Operators that might be able to help.

  • Related