I have a table on dbfs I can read with pyspark, but I only need to know the length of it (nrows). I know I could just read the file and do a table.count()
to get it, but that would take some time.
Is there a better way to solve this?
CodePudding user response:
I am afraid not.
Since you are using dbfs, I suppose you are using Delta format with Databricks. So, theoretically, you could check the metastore, but:
The metastore is not the source of truth about the latest information of a Delta table
https://docs.delta.io/latest/delta-batch.html#control-data-location