Home > other >  Consult a sparkstreaming accumulation problem
Consult a sparkstreaming accumulation problem

Time:10-08

On the issue of the accumulated in the figure, SchedulingDelay longer task queue is what's the problem?
Application starts, run a few days will start to pile up, occasionally will slow to come over, if directly to restart the program will be normal,
In many cases because of one batch ProcessingTime special long, suddenly started to extend the next batch SchedulingDelay, ProcessingTime back to normal again,
Will appear on the graph, like the second picture,


CodePudding user response:

Is not batchtime setting is too short, under the condition of invariable configuration, in normal times is can finish the 1 second for 2 seconds, but still accumulation would happen

CodePudding user response:

See figure at the end of the last few lines, is clearly a flood caused data skew the data, and figure 1 shows the average is 5400 events, consider a change on the spark streaming configuration, for cutting seam,
The key configuration is:
Spark. Streaming. Receiver. MaxRate=# article at most how many events per second
Specific see:
http://spark.apache.org/docs/latest/configuration.html#spark-streaming
According to actual circumstances, tuning,

CodePudding user response:

To a sudden flood peak, and then back for a long time is every batch of 0, upstream data is likely to be late, to communicate with the upstream news publishers is what circumstance?

CodePudding user response:

reference LinkSe7en reply: 3/f
to a sudden flood peak, and then back for a long time is every batch of 0, upstream data is likely to be late, to communicate with the upstream news publishers is what circumstance?

Thank you for your attention, sorry for the novice to consider not comprehensive, the problem of one thing forgot to say, these two figure is not the same program, the second when no data is to look at 0,
Data for this problem is really caused by flood, but the problem is the first flood peak data processing 2 min, lead to SchedulingDelay behind batch scheduling time delay, but the processing time is 1 s, this look not to understand the

CodePudding user response:

reference 4 floor D.F oil response:
Quote: reference LinkSe7en reply: 3/f

To a sudden flood peak, and then back for a long time is every batch of 0, upstream data is likely to be late, to communicate with the upstream news publishers is what circumstance?

Thank you for your attention, sorry for the novice to consider not comprehensive, the problem of one thing forgot to say, these two figure is not the same program, the second when no data is to look at 0,
Data for this problem is really caused by flood, but the problem is the first flood peak data processing 2 min, lead to SchedulingDelay behind batch scheduling time delay, but the processing time is 1 s, this look not to understand the


Suggest open FAIR scheduling, and set each job executor largest number, this number is (the total number of executor/n), so the job can concurrently, according to the processing efficiency (processing time is less than the time window) to adjust the value of n,
  • Related