Home > Software design >  Spark Shell code automation using power shell or windows batch file
Spark Shell code automation using power shell or windows batch file

Time:12-29

I have a scenario in which we are connecting apache spark with sql server load data of tables into spark and generate aparquet file from it.

Here is a snippet of my code:

val database = "testdb" 
val jdbcDF = (spark.read.format("jdbc")
.option("url",  "jdbc:sqlserver://DESKTOP-694SPLH:1433;integratedSecurity=true;databaseName=" database)
.option("dbtable", "employee")
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") 
.load())
jdbcDF.write.parquet("/tmp/output/people.parquet")

It is working fine in spark shell, but I want to automate this in Windows PowerShell, or a Windows Command Script, (batch file), so that it becomes part of a SQL Server job.

I would appreciate any suggestions, or leads.

CodePudding user response:

Have been able to do it myself i will list down the steps anyone can get help from it.

  1. put your code spark-shell code into into a scala file , program or scala app.
  2. build the spark scala app using SBT or Maven with Spark dependencies.
  3. once you are successfully able to compile and run your spark scala app.
  4. Package or Assemble you Scala app into a jar file , Assembly will make a fat jar file , i used Assembly.
  5. use spark-submit to call the jar file of your Spark App in a windows batch file this will automate your Spark code.
  • Related