I have 3 operators imported from airflow.providers.google.cloud.operators.dataproc
DataprocCreateBatchOperator
DataprocDeleteBatchOperator
DataprocGetBatchOperator
Need the same kind-of operators for Azure. Can please someone look into this or I have to create a new operator ?
CodePudding user response:
@Mazlum Tosun
For GCP in my code DataprocCreateBatchOperator used like this:-
create_batch = DataprocCreateBatchOperator(
task_id="CREATE_BATCH",
batch={
"pyspark_batch": {
"main_python_file_uri": f"gs://{ARTIFACT_BUCKET}/spark-jobs/main.py",
"args": app_args,
"python_file_uris": [
f"gs://{ARTIFACT_BUCKET}/spark-jobs/jobs.zip",
f"gs://{ARTIFACT_BUCKET}/spark-jobs/libs.zip"
],
"jar_file_uris": test_jars,
"file_uris": [
f"gs://{ARTIFACT_BUCKET}/config/params.yaml"
]
},
"environment_config": {
"peripherals_config": {
"spark_history_server_config": {}
}
}
},
region=REGION,batch_id=batch_id_str,)
CodePudding user response:
I believe the apache-airflow-providers-microsoft-azure
provider package equivalent for Dataproc operators would be Azure Synapse Operators.
Specifically, the AzureSynapseRunSparkBatchOperator
allows users to "execute a spark application within Synapse Analytics".
If you're running Spark jobs on Azure Databricks, there are also several Databricks Operators that might be able to help.