I'm trying to retrieve the user name inside spark-submit task in Databricks to write additional information to the table about a user who was changing the data. Unfortunately, I'm not able to find the correct way. For now, I was trying two things:
spark.sparkContext.sparkUser
and
System.getProperty("user.name")
but they both return root
.
Do you have any idea how to accomplish that?
CodePudding user response:
If you're using Delta Lake tables, then information about performed operations is captured in the history of the Delta Lake table - see an example in the documentation.
Databricks exposes a lot of information via spark.conf
- the configuration properties are starting with spark.databricks.clusterUsageTags.
, so you can filter all configurations and search for necessary information.
But you need to take into account that all operations in the job are performed under identity of the job owner, even if it's triggered by someone else.
There is a spark.databricks.clusterUsageTags.clusterAllTags
configuration property that contains a JSON string containing a list of cluster tags, that also include Owner
field with email of user who owns that Databricks job.