I have a table like this
company_id | an_array_of_maps
--------------------------------------------------------------
234 | [{"a": "a2", "b": "b2"}, {"a": "a4", "b": "b2"}]
123 | [{"a": "a1", "b": "b1"}, {"a": "a1", "b": "b1"}]
678 | [{"b": "b5", "c": "c5"}, {"b": Null, "c": "c5"}]
and i want to get a table like this (the value of the "a" key in each map)
company_id | an_array_of_maps
--------------------------------------------------------------
234 | ["a2", "a4"]
123 | ["a1", "a1"]
678 | ["b5", Null]
I tried this
df.withColumn("array_of_as", F.expr("filter(an_array_of_maps, x -> x.a)")).show()
but i get the following error:
AnalysisException: cannot resolve 'filter(`an_array_of_maps`, lambdafunction(namedlambdavariable()['a'], namedlambdavariable()))' due to data type mismatch: argument 2 requires boolean type, however, 'lambdafunction(namedlambdavariable()['a'], namedlambdavariable())' is of string type.;
CodePudding user response:
Got it - filter is the wrong function. It should be:
(df
.withColumn("array_of_as",
F.expr("transform(an_array_of_maps, x -> x.a)"))
).show()
I was not filtering anything i was transforming the list of maps into a list of values of the maps - hence the transform.