Home > Back-end >  Filter out struct of null values from an array of structs in spark dataframe
Filter out struct of null values from an array of structs in spark dataframe

Time:09-29

I have a col in a dataframe which is an array of structs. There are some structs with all null values which I would like to filter out. For example with the following dataframe:

 ————— ————————————————— 
|advertiser      |brands                     |
 ————— ————————————————— 
Advertiser1      [{"id" : "a", "type" : "b", "name" : "c"}]
Advertiser2      [{"id" : null, "type" : null, "name" : null}]
 ————— ————————————————— 

I would like to filter out the struct with the null values to get:

 ————— ————————————————— 
|advertiser      |brands                     |
 ————— ————————————————— 
Advertiser1      [{"id" : "a", "type" : "b", "name" : "c"}]
Advertiser2      []
 ————— ————————————————— 

I'm thinking it's something along the lines of this if I can come up with a struct of null values:

.withColumn(
    "brands",
    when(
        col("brands").equalTo(*emptyStruct?*),
            null
    )
)

CodePudding user response:

You can try to use the to_json function, brands with all null values will returns [{}].

CodePudding user response:

You want to filter the array elements. There is a method in org.apache.spark.sql.functions with the signature:

filter(column: Column, f: Column => Column): Column

which does this for you:

df.select(
  col("advertiser"),
  filter(col("brands"),
         b =>
           b.getField("id").isNotNull
             && b.getField("type").isNotNull
             && b.getField("name").isNotNull) as "brands"
  • Related