Home > other >  Insert data into a single column but in dictionary format after concatenating few column of data
Insert data into a single column but in dictionary format after concatenating few column of data

Time:06-30

I want to create a single column after concatenating number of columns in a single column but in dictionary format in PySpark.

I have concatenated data into a single column but I am unable to store it in a dictionary format. Please find the below attached screenshot for more details. Let me know if need more information.

enter image description here

CodePudding user response:

In your current situation, you can use str_to_map

from pyspark.sql import functions as F
df = spark.createDataFrame([("datatype:0,length:1",)], ['region_validation_check_status'])

df = df.withColumn(
    'region_validation_check_status',
    F.expr("str_to_map(region_validation_check_status, ',')")
)
df.show(truncate=0)
#  ------------------------------ 
# |region_validation_check_status|
#  ------------------------------ 
# |{datatype -> 0, length -> 1}  |
#  ------------------------------ 

If you didn't have a string yet, you could do it from column values with to_json and from_json

from pyspark.sql import functions as F
df = spark.createDataFrame([(1, 2), (3, 4)], ['a', 'b'])
df.show()
#  --- --- 
# |  a|  b|
#  --- --- 
# |  1|  2|
# |  3|  4|
#  --- --- 

df = df.select(
    F.from_json(F.to_json(F.struct('a', 'b')), 'map<string, int>')
)
df.show()
#  ---------------- 
# |         entries|
#  ---------------- 
# |{a -> 1, b -> 2}|
# |{a -> 3, b -> 4}|
#  ---------------- 
  • Related