The data frame has multiple columns in dictionary format - which have the same key. How can I explode them into rows without having to use any joins keeping the key from any of the columns?
The schema of the data frame is here The columns that need to be exploded are pct_ci_tr, pct_ci_rn, pct_ci_ttv and pct_ci_comm
CodePudding user response:
I would do something like this :
from pyspark.sql import functions as F
df.select(
"s__",
F.expr("""
stack(
4,
"pct_ci_tr",
pct_ci_tr,
"pct_ci_rn",
pct_ci_rn,
"pct_ci_ttv",
pct_ci_ttv,
"pct_ci_comm",
pct_ci_comm,
) as (lib, map_values)"""
),
).select("s__", "lib", F.explode(F.col("map_values")))