Home > Software engineering >  How do I pass security details from a Spark client to a Spark cluster and on to HDFS?
How do I pass security details from a Spark client to a Spark cluster and on to HDFS?

Time:01-07

Could anyone provide any useful resources describing security best practices in a distributed Spark environment?

I'm building a simple lab environment that looks a little bit like this:

SPARK_CLIENT | SPARK_MASTER -> SPARK_WORKER[n] | HIVE -> HDFS

For now the workloads mainly involve processing files stored on the HDFS, performing some transformations, and then writing the files back to HDFS as Deltas. In the real world, you'd have different files on HDFS accessible to different people, so there has to be a way to pass authentication details from client to Spark to HDFS. In Databricks you'd maybe use an app registration and Oauth2 passed to the file system via conf settings, but can anyone point me towards the correct procedure in an on-prem, classic Spark 3.3.1 environment? I think I need to be looking at Spark delegation tokens maybe?

Authentication and authorisation

CodePudding user response:

@OneCricketeer is correct.

Here's some links to get you started.

CodePudding user response:

In a distributed Spark environment, you can use Hadoop's security features, such as Kerberos, to pass authentication details from the Spark client to the Spark cluster and on to HDFS.

To use Kerberos for authentication in a Spark environment, you will need to set up a Kerberos server and configure the Spark client, Spark master, and Spark workers to use Kerberos for authentication.

Here are some resources that may be helpful for setting up and configuring security in a distributed Spark environment:

Apache Spark documentation on security: https://spark.apache.org/docs/latest/security.html Hadoop documentation on security: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html

It's also a good idea to follow best practices for secure distributed computing, such as using secure networks, implementing proper access controls, and regularly updating and patching your systems.

  • Related