"Val switchable viewer="topic1"
Val group="group1"
Val numThreads=6
Val SSC=new StreamingContext (sc, Seconds (2))
Val SQC=new SQLContext (sc)
Val topicMap=switchable viewer. The split (", "). The map ((_, numThreads. ToInt)). ToMap
Val lines.=KafkaUtils createStream (SSC, con, group, topicMap). The map (_) _2)
Val showLines=lines. The window (Minutes (60))
ShowLines. ForeachRDD (RDD=& gt; {
Val t=SQC. JsonRDD (RDD)
T.r egisterTempTable (" kafka_test ")
})
SSC. Start ()
This is I wrote about the spark streaming read kafka data application, but when large amount of data, will be closed, I want to realize the function of the concurrent, already achieved real-time data, how to do it? Thank you for the
Website have this KafkaUtils. CreateDirectStream
But I will go wrong and use it every time you Received 1 when reading from the channel, the socket has likely had closed
How does this work
CodePudding user response:
Are you even a zookeeper? CreateDirectStream flow pattern is the broker directlyCodePudding user response:
Val numInputDStreams=4Val kafkaDStreams=(1 to numInputDStreams). The map {_=& gt; KafkaUtils. CreateStream (SSC, zkQuorum, group, topicMap). The map (_) _2)}
KafkaDStreams. The map (
Your processing logic
)
Multi-process read kafka
Submitted with this, the back of the digital processing power to decide, according to your cluster every second for each process up to how much data from each partition consumption
- the conf spark. Streaming. Kafka. MaxRatePerPartition=10000
CodePudding user response:
Configure your spark. Streaming. Backpressure. Enabled and spark streaming. Backpressure. InitialRate two parameters