I am looking for a clear guide or steps to installing Spark packages (specifically spark-avro) to run locally and correctly using them with spark-submit command.
I've spent a lot of time reading many posts and guides, but still not able to get spark-submit to use the locally deployed spark-avro package. Hence, if someone has already accomplished this with spark-avro or another package, please share your wisdom :)
All the existing documentation I found is a bit unclear.
Clear steps and examples would be much appreciated! P.S. I know Python/PySpark/SQL, but not much Java (yet) ...
Michael
CodePudding user response:
In spark-submit command itself you can pass avro package details (make sure avro and spark version support)
spark-submit --packages org.apache.spark:spark-avro_<required_version>:<spark_version>
Example,
spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0
same way you can pass it along with spark-shell command as well to work on avro files.