I have dataset with data structures as show below
case class AddressData(
addressId: String,
customerId: String,
address: String,
number: Option[Int],
road: Option[String],
city: Option[String],
country: Option[String]
)
case class CustomerDocument(
customerId: String,
forename: String,
surname: String,
address: Seq[AddressData]
)
Schema
root
|-- customerId: string (nullable = true)
|-- forename: string (nullable = true)
|-- surname: string (nullable = true)
|-- accounts: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- customerId: string (nullable = true)
| | |-- accountId: string (nullable = true)
| | |-- balance: long (nullable = true)
|-- address: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- addressId: string (nullable = true)
| | |-- customerId: string (nullable = true)
| | |-- address: string (nullable = true)
| | |-- number: integer (nullable = true)
| | |-- road: string (nullable = true)
| | |-- city: string (nullable = true)
| | |-- country: string (nullable = true)
Sample data:
customerId | forename | surname | address |
---|---|---|---|
IND0222 | Charles | Piper | [[ADR285,IND0222,424, Lexington Avenue, New York, United States of America]] |
I am required to filter for a country ( highlighted item in bold, for eg. like Canada) from address list and create a new column and set the value to 'True' if the country is available or 'False' in case it is not available.
I am not sure how to apply filter condition inside the array of struct to achieve. Some form of guidance is appreciated. Thanks
CodePudding user response:
The below code worked for me to extract the country field from array of structs.
val countryFlag = df.withcolumn("isPresent", array_contains($"address.country", "Canada"))