Home > Back-end >  splitting string column into multiple columns based on key value item using spark scala
splitting string column into multiple columns based on key value item using spark scala

Time:11-26

I have a dataframe where one column contains several information in a 'key=value' format. There are almost a 30 different 'key=value' that can appear in that column will use 4 columns for understanding ( _age, _city, _sal, _tag)

id  name   properties
0   A      {_age=10, _city=A, _sal=1000}
1   B      {_age=20, _city=B, _sal=3000, tag=XYZ}
2   C      {_city=BC, tag=ABC}

How can I convert this string column into multiple columns? Need to use spark scala dataframe for it.

The expected output is:

id  name   _age    _city    _sal    tag
0   A      10      A       1000       
1   B      20      B       3000   XYZ
2   C              BC             ABC 

CodePudding user response:

Short answer

df
  .select(
    col("id"), 
    col("name"), 
    col("properties.*"), 
    ..
  )

CodePudding user response:

Try this :

val s = df.withColumn("dummy", explode(split(regexp_replace($"properties", "\\{|\\}", ""), ",")))
    val result= s.drop("properties").withColumn("col1",split($"dummy","=")(0)).withColumn("col1-value",split($"dummy","=")(1)).drop("dummy")
    result.groupBy("id","name").pivot("col1").agg(first($"col1-value")).orderBy($"id").show
  • Related