Home > Software design >  Scala Spark read with partitions drop partitions
Scala Spark read with partitions drop partitions

Time:12-03

There is hdfs-directory:

/home/path/date=2022-12-02, where date=2022-12-02 is a partition.

Parquet file with the partition "date=2022-12-02", has been written to this directory.

To read file with partition, I use:

   spark
        .read
        .option("basePath", "/home/path")
        .parquet("/home/path/date=2022-12-02")

The file is read successfully with all partition-fieds.

But, partition folder ("date=2022-12-02") is dropped from directory.

I can't grasp, what is the reason and how to avoid it.

CodePudding user response:

There are two ways to have the date as part of your table;

  1. Read the path like this: .parquet("/home/path/")

  2. Add a new column and use input_file_path() function, then manipulate with the string until you get date column (should be fairly easy, taking last part after slash, splitting on equal sign and taking index 1).

I don't think there is another way to do what you want directly.

  • Related