The columns of the spark SQL read parquet binary format-CodePudding

Everybody is good, I passed the parquet API to write a file, two columns, id and the content column, which the content as a binary format, cannot read, I now use the spark SQL to read the file, the spark version 1.6 version, using Java to write, want to extract the binary column for processing, through the spark Row can only obtain the data String format, does not provide for binary interface, obtained by way of String data to byte array is not the original thing, a great god, please grant instruction, whether to have removed the column content, thank you very much,

CodePudding user response:

Did you try:

SqlContext. SetConf (" spark. SQL. Parquet. BinaryAsString ", "true")