Home > Software design >  How to install external python libraries in Pyspark?
How to install external python libraries in Pyspark?

Time:08-01

When I was some pyspark code, it required to me to install a Python module called fuzzywuzzy (that I used to apply the leiv distance)

This is a python libraries and seems that pyspark doesn't have the module installed... so, How can I install this module inside Pyspark??

CodePudding user response:

You'd use pip as normal, with the caveat that Spark can run on multiple machines, and so all machines in the Spark cluster (depending on your cluster manager) will need the same package (and version)

Or you can pass zip, whl or egg files using --py-files argument to spark-submit, which get unbundled during code execution

  • Related