Home > OS >  Filter array of struct fields in case class
Filter array of struct fields in case class

Time:03-25

I have dataset with data structures as show below

 case class AddressData(
                          addressId: String,
                          customerId: String,
                          address: String,
                          number: Option[Int],
                          road: Option[String],
                          city: Option[String],
                          country: Option[String]
                        )

case class CustomerDocument(
                               customerId: String,
                               forename: String,
                               surname: String,
                               address: Seq[AddressData]
                             )

Schema

root
 |-- customerId: string (nullable = true)
 |-- forename: string (nullable = true)
 |-- surname: string (nullable = true)
 |-- accounts: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- customerId: string (nullable = true)
 |    |    |-- accountId: string (nullable = true)
 |    |    |-- balance: long (nullable = true)
 |-- address: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- addressId: string (nullable = true)
 |    |    |-- customerId: string (nullable = true)
 |    |    |-- address: string (nullable = true)
 |    |    |-- number: integer (nullable = true)
 |    |    |-- road: string (nullable = true)
 |    |    |-- city: string (nullable = true)
 |    |    |-- country: string (nullable = true)

Sample data:

customerId forename surname address
IND0222 Charles Piper [[ADR285,IND0222,424, Lexington Avenue, New York, United States of America]]

I am required to filter for a country ( highlighted item in bold, for eg. like Canada) from address list and create a new column and set the value to 'True' if the country is available or 'False' in case it is not available.

I am not sure how to apply filter condition inside the array of struct to achieve. Some form of guidance is appreciated. Thanks

CodePudding user response:

The below code worked for me to extract the country field from array of structs.

val countryFlag = df.withcolumn("isPresent", array_contains($"address.country", "Canada"))
  • Related