Question: This document from Azure Databricks describes tasks you can perform on a Databricks File System [DBFS]
using dbutils. Is there a simple method to find total number of files in a folder inside DBFS
?
CodePudding user response:
you can use both ways to get the count values:
Option1:
dbutils.fs.ls()
returns the file info for all the files present in the specified path as a list.
using len() on this returned list to get the count of files in that path
len(dbutils.fs.ls('/FileStore/tables/'))
Or
Option2:
import os
paths =os.listdir('/dbfs/FileStore/tables')
print(len(paths))
CodePudding user response:
Glob's file matching patterns can be also helpful to pinpoint certain file types or avoid listing directory names:
from glob import glob
len(
glob("/dbfs/mnt/raw/datasource/dataset/*.parquet")
)
Out[17]: 47
Also, it has optional recursive search: recursive=True