We are using spark as a data processing platform and Scala programming language. When we write data on storage account(ADLS gen 2), we partition the data by datetime column which is of type java.sql.Timestamp. We write the data using spark dataframe.write operation
By default, it creates following path on storage account and writes parquet files in it
Path - __datetime=a/b/c/yyyy-MM-dd HH:mm:ss
The problem is, it has encoded : but not space and because the URL is not fully encoded, it creates problems for us. Is there a fix to this problem?
Can I change the format of a column(of type java.sql.Timestamp), so that the output file path looks like this which does not have any encoding?
__datetime=a/b/c/yyyy-MM-dd-HH-mm-ss
or
__datetime=a/b/c/yyyy_MM_dd_HH_mm_ss
Is it possible to do this within java.sql.Timestamp object and without converting it to a string?
Thanks
CodePudding user response:
You can change the name / type dataframe column with a simple select alias.
The encoding is necessary, though because file paths cannot have :
characters, but they can have spaces... Unclear why you need full URL encoding