Scala Spark read with partitions drop partitions-CodePudding

There is hdfs-directory:

/home/path/date=2022-12-02, where date=2022-12-02 is a partition.

Parquet file with the partition "date=2022-12-02", has been written to this directory.

To read file with partition, I use:

   spark
        .read
        .option("basePath", "/home/path")
        .parquet("/home/path/date=2022-12-02")

The file is read successfully with all partition-fieds.

But, partition folder ("date=2022-12-02") is dropped from directory.

I can't grasp, what is the reason and how to avoid it.

CodePudding user response：

There are two ways to have the date as part of your table;

Read the path like this: .parquet("/home/path/")
Add a new column and use input_file_path() function, then manipulate with the string until you get date column (should be fairly easy, taking last part after slash, splitting on equal sign and taking index 1).

I don't think there is another way to do what you want directly.