I have a scenario in which we are connecting apache spark with sql server load data of tables into spark and generate aparquet file from it.
Here is a snippet of my code:
val database = "testdb"
val jdbcDF = (spark.read.format("jdbc")
.option("url", "jdbc:sqlserver://DESKTOP-694SPLH:1433;integratedSecurity=true;databaseName=" database)
.option("dbtable", "employee")
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.load())
jdbcDF.write.parquet("/tmp/output/people.parquet")
It is working fine in spark shell, but I want to automate this in Windows PowerShell, or a Windows Command Script, (batch file), so that it becomes part of a SQL Server job.
I would appreciate any suggestions, or leads.
CodePudding user response:
Have been able to do it myself i will list down the steps anyone can get help from it.
- put your code spark-shell code into into a scala file , program or scala app.
- build the spark scala app using SBT or Maven with Spark dependencies.
- once you are successfully able to compile and run your spark scala app.
- Package or Assemble you Scala app into a jar file , Assembly will make a fat jar file , i used Assembly.
- use spark-submit to call the jar file of your Spark App in a windows batch file this will automate your Spark code.