Home > Enterprise >  How to implement a BOUNDED source for Flink's batch execution mode?
How to implement a BOUNDED source for Flink's batch execution mode?

Time:12-03

I'm trying to do a Flink (1.12.1) batch job, with the following steps:

  • Custom SourceFunction to connect with MongoDB
  • Do any flatmaps and maps to transform some data
  • Sink it in other MongoDB

I'm trying to run it in a StreamExecutionEnvironment , with RuntimeExexutionMode.BATCH, but the application throws a exception because detect my source as UNBOUNDED... And I can't set it BOUNDED ( it must finish after collect all documents in the mongo collection )

The exception:

    exception in thread "main" java.lang.IllegalStateException: Detected an UNBOUNDED source with the 'execution.runtime-mode' set to 'BATCH'. This combination is not allowed, please set the 'execution.runtime-mode' to STREAMING or AUTOMATIC
        at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
        at org.apache.flink.streaming.api.graph.StreamGraphGenerator.shouldExecuteInBatchMode(StreamGraphGenerator.java:335)
        at org.apache.flink.streaming.api.graph.StreamGraphGenerator.generate(StreamGraphGenerator.java:258)
        at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1958)
        at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1943)
        at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1782)
        at com.grupotsk.bigdata.matadatapmexporter.MetadataPMExporter.main(MetadataPMExporter.java:33)

Some code:

Execution environment

public static StreamExecutionEnvironment getBatch() {
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setRuntimeMode(RuntimeExecutionMode.BATCH);
    
    env.addSource(new MongoSource()).print();
    
    return env;
    
}

Mongo source:

public class MongoSource extends RichSourceFunction<Document> {

    private static final long serialVersionUID = 8321722349907219802L;
    private MongoClient mongoClient;
    private MongoCollection mc;
    
    
    @Override
    public void open(Configuration con) {
        mongoClient = new MongoClient(
                new MongoClientURI("mongodb://localhost:27017/database"));
        
        mc=mongoClient.getDatabase("database").getCollection("collection");
        
    }
    
    @Override
    public void run(SourceContext<Document> ctx) throws Exception {
        
        MongoCursor<Document> itr=mc.find(Document.class).cursor();
        while(itr.hasNext())
            ctx.collect(itr.next());
        this.cancel();
        
    }

    @Override
    public void cancel() {
        mongoClient.close();
        
    }

Thanks !

CodePudding user response:

Sources used with RuntimeExecutionMode.BATCH must implement Source rather than SourceFunction. And the sink should implement Sink rather than SinkFunction.

See Integrating Flink into your ecosystem - How to build a Flink connector from scratch for an introduction to these new interfaces. They are described in FLIP-27: Refactor Source Interface and FLIP-143: Unified Sink API.

  • Related