I have a .Net Console Application which perform some operation on given inputs and provide outputs. Have written Spark Wrapper on that, and locally works fine. Facing issue to install this .NET publish packages and dependencies into an Azure Databricks Cluster (with this Notebook is attached).
using Microsoft.Spark.Sql;
using System;
namespace MySparkApp
{
class Program
{
static void Main(string[] args)
{
// Create a Spark session
SparkSession spark = SparkSession
.Builder()
.AppName("word_count_sample")
.GetOrCreate();
//Register UDFs
Func<string,string> getName = GetName;
spark.Udf().Register("UDF_GetName", getName);
// Create initial DataFrame
DataFrame dataFrame = spark.Read().Text("input.txt");
// Count words
DataFrame words = dataFrame
.Select(Functions.Split(Functions.Col("value"), " ").Alias("words"))
.Select(Functions.Explode(Functions.Col("words"))
.Alias("word"))
.GroupBy("word")
.Count()
.OrderBy(Functions.Col("count").Desc());
// Show results
words.Show();
// Stop Spark session
spark.Stop();
}
public static string GetName(string name)
{
return "Hello " name;
}
}
}
Can you please guide me how to install my dependencies and then invoke UDFs from Notebook?
What I have done?
- I followed https://learn.microsoft.com/en-us/dotnet/spark/tutorials/databricks-deployment and able to print expected result under Drive Logs Section. However, this article doesn't give me a path how I should call my UDF "UDF_GetName" from Azure Notebook.
Any guidance will be appreciable. Thanks!
CodePudding user response:
Same question I had posted to Azure Support Community https://learn.microsoft.com/en-us/answers/questions/1052175/net-udf-for-apache-spark-must-be-callable-from-azu.html
From there I got answer:
" Hi @NeerajKumarSingh-7721,
I could see the steps of deploying your .NET app to databricks workspace. Kindly check this link.
Did you tried it? Where exactly you stuck? If I am not wrong, you can publish your app to databricks and submit it as job to run. If you are looking to for calling some function which is inside your .NET app form Databricks notebook then thats not possible. Azure databaricks will not support C# languague. You need to consider using Synapse Analytics in that case. Because there we have native support for C# as well.
Please let me know how it goes. "
Hence conclusion is: We cannot call .NET UDFs through Azure Databricks Notebook! However other options are available example - Azure Synapse.
Thanks MSFT to answer this!