There is hdfs-directory:
/home/path/date=2022-12-02, where date=2022-12-02 is a partition.
Parquet file with the partition "date=2022-12-02", has been written to this directory.
To read file with partition, I use:
spark
.read
.option("basePath", "/home/path")
.parquet("/home/path/date=2022-12-02")
The file is read successfully with all partition-fieds.
But, partition folder ("date=2022-12-02") is dropped from directory.
I can't grasp, what is the reason and how to avoid it.
CodePudding user response:
There are two ways to have the date
as part of your table;
Read the path like this:
.parquet("/home/path/")
Add a new column and use
input_file_path()
function, then manipulate with the string until you get date column (should be fairly easy, taking last part after slash, splitting on equal sign and taking index 1).
I don't think there is another way to do what you want directly.