I have the below column in a table called 'test'.
How can I get the 'id' and 'value' (eg.for 'id' = 2, I should get the value '24' and null for other two ids) from the given table.
The 'data type' for the column 'age' is 'Map' and I'm not sure how to deal with this.
A simple query in Python or SQL or any leads is much appreciated. Also, please advise on the packages to import.
CodePudding user response:
You can get it in sql or python. In Python You can try
agecolumn=age.replace("{","").replace("}","").split("=")
if agecolumn[1].strip():
do domething
CodePudding user response:
explode
function would "explode" your map to key and value pairs, then you can use them in anyway you want.
from pyspark.sql import functions as F
(df
.select('id', F.explode('age').alias('k', 'v'))
.show()
)
--- --- ----
| id| k| v|
--- --- ----
| 2|age| 24|
| 3|age|null|
--- --- ----