I have the following Pyspark dataframe:
df = spark.sql("select unhex('0A54C9E024AA62F9EF8BE39231782F9240B51CFB82D1CF7586F734EE07B51086') as db_key")
As you can see, it has only one column "db_key" with only one value: the result of performing the operation unhex
over this token 0A54C9E024AA62F9EF8BE39231782F9240B51CFB82D1CF7586F734EE07B51086
. If I execute display
over the previous dataframe, I get the following result:
display(df)
But if I execute show()
I get this result:
df.show()
I want to obtain the same string I obtain with display
but using show()
. I tried casting like this, but the result is not what I want:
df = spark.sql("select cast(unhex('0A54C9E024AA62F9EF8BE39231782F9240B51CFB82D1CF7586F734EE07B51086') AS STRING) as db_key")
df.show()
What can I do?
CodePudding user response:
When you see an =
(equal) sign at the end, it is probably base64 related.
Fortunately, there is a built-in function base64
for that in Spark :
from pyspark.sql import functions as F
df.withColumn("db_key_str", F.base64(F.col("db_key"))).show()
-------------------- --------------------
| db_key| db_key_str|
-------------------- --------------------
|[0A 54 C9 E0 24 A...|ClTJ4CSqYvnvi OSM...|
-------------------- --------------------