Spark: Find country by geographical coordinates-CodePudding

I have a spark dataframe created in Scala that contains geographical coordinates. I have to add column with country basing on this this geographical coordinates. I found some Python tools but as far as I know I can t use it in Scala code. Also I am not sure about efficiency if I have to process every row one by one by udf (it s about 50000 rows). Do you know how can I process this in the fastest possible way?

CodePudding user response：

You can use Pandas UDF if the tool you found is a python library. With this, it will parallelize your function and not apply it "row by row".

https://databricks.com/fr/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html

https://databricks.com/fr/blog/2020/05/20/new-pandas-udfs-and-python-type-hints-in-the-upcoming-release-of-apache-spark-3-0.html