I have a streaming job that I am running in Apache Flink.
It consists of the following operators:
- Source Function (generates data)
- Filter Function (filters out some data)
- GroupBy and aggregate (groups data based on a key and then runs an aggregation function over them)
- Another filter function
- File Sink (saves output to file)
However, when I sumbit the job I get the following graph.
As you can see for example the 3rd and 4th steps are merged together. My question is, since I want to measure the throughput and latency of each step, is there a way to split each operator on its own vertex? In this example: Have a node for the groupBy-Aggregation and another one for the filter function (endsWith).
Thanks in advance! :)
CodePudding user response:
You can do this with
env.disableOperatorChaining();
However, you should expect this to degrade performance, perhaps significantly. See Flink disableOperatorChaining Performance impact for more about this.