Home > Net >  Databricks Files System: How to count files in as folder by using notebook
Databricks Files System: How to count files in as folder by using notebook

Time:07-26

Question: This document from Azure Databricks describes tasks you can perform on a Databricks File System [DBFS] using dbutils. Is there a simple method to find total number of files in a folder inside DBFS?

CodePudding user response:

you can use both ways to get the count values:

Option1:

dbutils.fs.ls() returns the file info for all the files present in the specified path as a list.

using len() on this returned list to get the count of files in that path

 len(dbutils.fs.ls('/FileStore/tables/'))

Or

Option2:

import os
paths =os.listdir('/dbfs/FileStore/tables')

print(len(paths))

CodePudding user response:

Glob's file matching patterns can be also helpful to pinpoint certain file types or avoid listing directory names:

from glob import glob
len(
    glob("/dbfs/mnt/raw/datasource/dataset/*.parquet")
)

Out[17]: 47

Also, it has optional recursive search: recursive=True

  • Related