After watching a couple of talks and reading some blogs about ApacheBeam and its API/SDK, I still can´t get my head around whether it is also capable of integrating with distributed Machine Learning training paradigms (e.g. data-parallel).
The content I found online usually only covers how it can help to build an end-to-end ML pipeline (from pre-processing to serving) but leaves out if the training can be also done in parallel.
So my question is: Can one actually integrate it with, say, Tensorflow´s distributed trainings libs, like tf.distribute.MultiWorkerMirroredStrategy
?
CodePudding user response:
There's not currently a good way to do this kind of training in Beam because Beam's model assumes that all worker machines in a given step are independent of each other, and it enforces a directed graph (so there's not an easy way to do native iterative training either).
This could change eventually, but in the short term Beam is better situated for more "embarassingly parallel" operations like inference, pre/post-processing, and some specific forms of training that fit onto a single machine (e.g. training many per-entity models, some kinds of online training)