There is a function in Pandas that calculates the shape of my DataFrame which eventually is the result like
[total number of rows, total number of columns]
I have the following function that I can use in PySpark to get the shape of my DataFrame:
print((df.count(), len(df.columns)))
How do I do the same in Scala? Is this also an efficient way to do it like this for larger datasets?
CodePudding user response:
The solution is almost the same as in python. Looking at the documentation of DataFrame, you can see that there are two interesting methods; count()
and columns()
, which exactly do what you want.
count()
returns the number of rows in the DataFrame
, and columns()
returns an array of all column names. To get the amount of columns, you'll just have to get the length of it.
TL;DR df.count()
for the row amount and df.columns.length
for the column amount.