I'm trying to decode the GEOHASH to Latitude and Longitude using the pygeohash library. Below is my code
import pygeohash as pgh
from pyspark.sql.types import StringType
udf1 = udf(lambda x: pgh.decode(x))
add_latlong = add.withColumn('location', udf1(col('GEOHASH')))
However, I'm getting the result below:
------------ --------------------
| GEOHASH| location|
------------ --------------------
|w284nyv39qzn|[Ljava.lang.Objec...|
|w0zqyr64nt4v|[Ljava.lang.Objec...|
|w2815pb0yfgr|[Ljava.lang.Objec...|
|w281xv1czv1t|[Ljava.lang.Objec...|
|w2r7cvc0m1bz|[Ljava.lang.Objec...|
------------ --------------------
I've come across this thread PySpark UDF Returns [Ljava.lang.Object;@] that mentioned to use StringType as the second parameter of the udf but I'm still seeing the same result as above. How do I get the latitude and longitude from here?
Appreciate your help
Update: I've used the solution from Jonathan Lam below and for completeness here's the code and dataframe.
udf1 = udf(lambda x: pgh.decode(x), ArrayType(FloatType()))
add_latlong = add.withColumn('location', udf1(col('GEOHASH'))).withColumn('Lat',col('location')[0]).withColumn('Long',col('location')[1])
------------ -------------------- -------- ----------
| GEOHASH| location| lat| long|
------------ -------------------- -------- ----------
|w2864utg8uyf|[3.189408, 101.73...|3.189408| 101.73035|
|w281hj25hzre|[3.017675, 101.42...|3.017675|101.425995|
|w2830hj8vzrp|[3.010423, 101.60...|3.010423|101.609375|
|w0zf5uepz8uk|[4.596367, 101.06...|4.596367| 101.06768|
|w2rkk6s97gvt|[2.167289, 111.63...|2.167289| 111.63843|
------------ -------------------- -------- ----------
CodePudding user response:
I'm not sure if your case is the same as the link you provided, since you are using external package to do the transformation pgh.decode(x)
. Based on the docs:
pgh.decode(geohash='ezs42')
# >>> ('42.6', '-5.6')
I think you should use ArrayType(FloatType())
instead.