Home > database >  Return None or empty dataframe when reading in input that is empty in PYSPARK
Return None or empty dataframe when reading in input that is empty in PYSPARK

Time:01-06

So I am trying to read in a folder that may be empty sometimes

The folder is called ABC.csv that has no csv in it.

df = spark.read.parquet("/Users/test/Downloads/ABC.csv")

How do I return None or an empty dataframe when reading it in as sometimes it may have contents.

CodePudding user response:

Sample code snippet. Please modify based on your input files.

    import glob
    list_of_files = glob.glob("D:/data/in/dcad_data/*.csv")
    if list_of_files:
        # create dataFrame
        # df = spark.read.
        pass
    else:
        df = None
    print(df)

CodePudding user response:

You can check if the folder is empty or not by using python like this,

import os
  
# path of the directory
path = "/Users/test/Downloads/ABC.csv"
  
# Getting the list of directories
dir = os.listdir(path)
  
# Checking if the list is empty or not
if len(dir) == 0:
    df = spark.createDataFrame([], StructType([]))
else:
    df = spark.read.parquet("/Users/test/Downloads/ABC.csv")

or if you want to search only if parquet files are present in the folder or not, then do this,

import glob
import os.path

# path of the directory
path = "/Users/test/Downloads/ABC.csv"

parquet_files = glob.glob(os.path.join(path, '*.parquet'))

# Checking if the list is empty or not
if len(parquet_files) == 0:
    df = spark.createDataFrame([], StructType([]))
else:
    df = spark.read.parquet("/Users/test/Downloads/ABC.csv")
  • Related