So I am trying to read in a folder that may be empty sometimes
The folder is called ABC.csv that has no csv in it.
df = spark.read.parquet("/Users/test/Downloads/ABC.csv")
How do I return None or an empty dataframe when reading it in as sometimes it may have contents.
CodePudding user response:
Sample code snippet. Please modify based on your input files.
import glob
list_of_files = glob.glob("D:/data/in/dcad_data/*.csv")
if list_of_files:
# create dataFrame
# df = spark.read.
pass
else:
df = None
print(df)
CodePudding user response:
You can check if the folder is empty or not by using python like this,
import os
# path of the directory
path = "/Users/test/Downloads/ABC.csv"
# Getting the list of directories
dir = os.listdir(path)
# Checking if the list is empty or not
if len(dir) == 0:
df = spark.createDataFrame([], StructType([]))
else:
df = spark.read.parquet("/Users/test/Downloads/ABC.csv")
or if you want to search only if parquet files are present in the folder or not, then do this,
import glob
import os.path
# path of the directory
path = "/Users/test/Downloads/ABC.csv"
parquet_files = glob.glob(os.path.join(path, '*.parquet'))
# Checking if the list is empty or not
if len(parquet_files) == 0:
df = spark.createDataFrame([], StructType([]))
else:
df = spark.read.parquet("/Users/test/Downloads/ABC.csv")