I want to create a single column after concatenating number of columns in a single column but in dictionary format in PySpark.
I have concatenated data into a single column but I am unable to store it in a dictionary format. Please find the below attached screenshot for more details. Let me know if need more information.
CodePudding user response:
In your current situation, you can use str_to_map
from pyspark.sql import functions as F
df = spark.createDataFrame([("datatype:0,length:1",)], ['region_validation_check_status'])
df = df.withColumn(
'region_validation_check_status',
F.expr("str_to_map(region_validation_check_status, ',')")
)
df.show(truncate=0)
# ------------------------------
# |region_validation_check_status|
# ------------------------------
# |{datatype -> 0, length -> 1} |
# ------------------------------
If you didn't have a string yet, you could do it from column values with to_json
and from_json
from pyspark.sql import functions as F
df = spark.createDataFrame([(1, 2), (3, 4)], ['a', 'b'])
df.show()
# --- ---
# | a| b|
# --- ---
# | 1| 2|
# | 3| 4|
# --- ---
df = df.select(
F.from_json(F.to_json(F.struct('a', 'b')), 'map<string, int>')
)
df.show()
# ----------------
# | entries|
# ----------------
# |{a -> 1, b -> 2}|
# |{a -> 3, b -> 4}|
# ----------------