Home > other >  Data Locality in Spark on Kubernetes colocated with HDFS pods
Data Locality in Spark on Kubernetes colocated with HDFS pods

Time:11-25

Revisiting the data locality for Spark on Kubernetes question: if the Spark pods are colocated on the same nodes as the HDFS data node pods then does data locality work ?

The Q&A session here: https://www.youtube.com/watch?v=5-4X3HylQQo seems to suggest it doesn't.

CodePudding user response:

Locality is an issue. Data locality can be sorta fixed but there still are very much open issues. Here's a deep dive into the issue: video response to your video.

CodePudding user response:

...right but that 2017 video link details the 2017 fork, Matt, and the 2021 Q&A session above suggests the data locality fix part of the fork never made it to spark on k8s before it went GA.

  • Related