Revisiting the data locality for Spark on Kubernetes question: if the Spark pods are colocated on the same nodes as the HDFS data node pods then does data locality work ?
The Q&A session here: https://www.youtube.com/watch?v=5-4X3HylQQo seems to suggest it doesn't.
CodePudding user response:
Locality is an issue. Data locality can be sorta fixed but there still are very much open issues. Here's a deep dive into the issue: video response to your video.
CodePudding user response:
...right but that 2017 video link details the 2017 fork, Matt, and the 2021 Q&A session above suggests the data locality fix part of the fork never made it to spark on k8s before it went GA.