Home > Net >  Spark Print the Shape of my DataFrame in Scala
Spark Print the Shape of my DataFrame in Scala

Time:11-06

There is a function in Pandas that calculates the shape of my DataFrame which eventually is the result like

[total number of rows, total number of columns]

I have the following function that I can use in PySpark to get the shape of my DataFrame:

print((df.count(), len(df.columns)))

How do I do the same in Scala? Is this also an efficient way to do it like this for larger datasets?

CodePudding user response:

The solution is almost the same as in python. Looking at the documentation of DataFrame, you can see that there are two interesting methods; count() and columns(), which exactly do what you want.

count() returns the number of rows in the DataFrame, and columns() returns an array of all column names. To get the amount of columns, you'll just have to get the length of it.

TL;DR df.count() for the row amount and df.columns.length for the column amount.

  • Related