Home > Enterprise >  Nested struct in a struct
Nested struct in a struct

Time:10-21

Given some rows coming from a SQL data source with an schema like...

| A | B | C | D | E | F |

... I'd like to transform it into:

{
    A: {
       invented: { B, C }
       D, 
       E
       F
    }
}

AFAIK, dataFrame.withColumn won't let me implement such transformation (it doesn't support nesting a struct into a first-level struct)

Is my goal ever possible?

CodePudding user response:

I think that following code should work (if I understood correctly your question):

df
  .withColumn("nested_struct", struct(
      col("A"),
      struct(
        col("B"),
        struct(
          col("C"),
          struct(col("E"), col("F"))
        ),
        col("D")
      )
    )
  )

CodePudding user response:

First of all, thanks to @partlov and his answer. Actually, when I first posted my question, I forgot to mention that some nested struct had to own an unexistent column.

That said, the issue was very easy to resolve.

My first attemp was:

 dataFrame.WithColumn("invented",
                Struct
                (
                    Struct("invented2", "A"),
                ))

But this was throwing an exceptions: Spark complained "could not resolve 'invented'" because invented isn't in the schema.

Then I realized that I could try to don't provide "invented" at all. And it worked, but Spark created col1. Finally, I tried to alias col1, and it has solved the issue!

 dataFrame.WithColumn("invented",
                Struct
                (
                    Struct("invented2", "A").As("X"),
                ));

Note: above sample is C# code, and I'm using .NET for Spark. Anyway, it should work the same way in Scala, Python, Java, R...

  • Related