Home > OS >  How to convert this pyspark binary column to string?
How to convert this pyspark binary column to string?

Time:11-08

I have the following Pyspark dataframe:

df = spark.sql("select unhex('0A54C9E024AA62F9EF8BE39231782F9240B51CFB82D1CF7586F734EE07B51086') as db_key")

As you can see, it has only one column "db_key" with only one value: the result of performing the operation unhex over this token 0A54C9E024AA62F9EF8BE39231782F9240B51CFB82D1CF7586F734EE07B51086. If I execute display over the previous dataframe, I get the following result:

display(df)

enter image description here

But if I execute show() I get this result:

df.show()

enter image description here

I want to obtain the same string I obtain with display but using show(). I tried casting like this, but the result is not what I want:

df = spark.sql("select cast(unhex('0A54C9E024AA62F9EF8BE39231782F9240B51CFB82D1CF7586F734EE07B51086') AS STRING) as db_key")
df.show()

enter image description here

What can I do?

CodePudding user response:

When you see an = (equal) sign at the end, it is probably base64 related. Fortunately, there is a built-in function base64 for that in Spark :

from pyspark.sql import functions as F


df.withColumn("db_key_str", F.base64(F.col("db_key"))).show()
 -------------------- -------------------- 
|              db_key|          db_key_str|
 -------------------- -------------------- 
|[0A 54 C9 E0 24 A...|ClTJ4CSqYvnvi OSM...|
 -------------------- -------------------- 
  • Related