Home > Blockchain >  Loop across a dataframe in .NET Spark
Loop across a dataframe in .NET Spark

Time:11-09

I have a dataframe(created by reading a csv) in Spark, how do I loop across the rows in this dataframe in C#. There are 10 rows and 3 columns in the dataframe and I would like to get the value for each of the column as I navigate through the rows one by one. Below is what I am trying:

foreach (var obj in df)
{
  Console.WriteLine("test");
}

foreach statement cannot operate on variables of type 'DataFrame' because 'DataFrame' does not contain a public instance definition for 'GetEnumerator'

CodePudding user response:

The DataFrame is a reference to the actual data on the spark cluster. If you want to see the actual data (as opposed to running some transformation and writing to the output which is the typical use case) you need to collect the data over to your application.

https://learn.microsoft.com/en-us/dotnet/api/microsoft.spark.sql.dataframe.collect?view=spark-dotnet

foreach (var obj in df.Collect())
{
  Console.WriteLine("test");
}

This will give you an enumerable of Row which has Values which is an object array of the actual values.

If you just wanted to see the contents for debugging then you can do:

df.Show();

Show takes two arguments, the first is the number of rows and the second is how many chars width to show in case your data is truncated and you need to see all the columns:

df.Show(100, 10000);
  • Related