I have a dataframe(created by reading a csv) in Spark, how do I loop across the rows in this dataframe in C#. There are 10 rows and 3 columns in the dataframe and I would like to get the value for each of the column as I navigate through the rows one by one. Below is what I am trying:
foreach (var obj in df)
{
Console.WriteLine("test");
}
foreach statement cannot operate on variables of type 'DataFrame' because 'DataFrame' does not contain a public instance definition for 'GetEnumerator'
CodePudding user response:
The DataFrame
is a reference to the actual data on the spark cluster. If you want to see the actual data (as opposed to running some transformation and writing to the output which is the typical use case) you need to collect the data over to your application.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.spark.sql.dataframe.collect?view=spark-dotnet
foreach (var obj in df.Collect())
{
Console.WriteLine("test");
}
This will give you an enumerable of Row
which has Values
which is an object array of the actual values.
If you just wanted to see the contents for debugging then you can do:
df.Show();
Show
takes two arguments, the first is the number of rows and the second is how many chars width to show in case your data is truncated and you need to see all the columns:
df.Show(100, 10000);