Home > other >  Consult the spark radio variable dynamic update strategy
Consult the spark radio variable dynamic update strategy

Time:10-13

The inside of the radio variable value read from redis inside out, when the inside of the redis data after change, and I will update the spark radio variables, do not know to have bosses have dealt with similar problems? Can you provide some relevant treatment?

CodePudding user response:

My top first, at present the baidu many broadcast. DoUnpersist (true); Method this method is the driver side again, that is to say, the spark program will only perform one side, when all the task executor to calculate RDD will not take to carry out the broadcast. DoUnpersist (true); This method;

So I'm thinking of the spark and the above mechanism can make actuators forced interruption, and then back to the driver side I want to do, and then the actuator mission again! This expression probably will violate sprak certain principles, but I want a similar effect, don't know if you recognize the relevant processing experience to share!

CodePudding user response:

After experimental sparkstreaming using radio variables under the mode of yarn is abnormal returns empty, so or use other schemes

CodePudding user response:

Can create a thread in the driver end of the spark, regularly call sc. Broadcast, tested and available

CodePudding user response:

On yarn pattern to empty

CodePudding user response:

reference JDJWXJ reply: 3/f
can create a thread in the driver end of the spark, regularly call sc. Broadcast, tested, available
on the yarn model can operate? I tried to empty abnormal! Could you post your success code to see see

CodePudding user response:

reference 5 floor weixin_41779648 reply:
Quote: refer to the third floor JDJWXJ response:
can create a thread in the driver end of the spark, regularly call sc. Broadcast, tested, available
on the yarn model can operate? I tried to empty abnormal! Could you post the code of your success see

The building Lord this problem solved? I met the same problem, try to use the way objects instantiated variables, and serialized to broadcast, or no,

CodePudding user response:

Behind this problem variables by radio or not, using a global variable

CodePudding user response:

refer to 6th floor eeff response:
Quote: refer to the fifth floor weixin_41779648 reply:

Quote: refer to the third floor JDJWXJ response:
can create a thread in the driver end of the spark, regularly call sc. Broadcast, tested, available
on the yarn model can operate? I tried to empty abnormal! Could you post the code of your success see

The building Lord this problem solved? I met the same problem, try to use the way objects instantiated variables, and serialized to broadcast, or no,
no! Now USES a global variable

CodePudding user response:

Is what I read on Redis anyway, and use it every time you read directly from the Redis not finished? Why radio?
We are all through the Redis variables instead of broadcasting,,, you this, in turn, have no what meaning ah

CodePudding user response:

references 9 f LinkSe7en response:
is what I read on Redis anyway, and use it every time you read directly from the Redis not finished? Why radio?
We are all through the Redis variables instead of broadcasting,,, you this, in turn, have no what meaning ah

Because don't want to again and again from the redis data, but also want to update regularly, so want to use radio variables and the ability to dynamically modify

CodePudding user response:

The building Lord have to solve? I tried to read a local file, serialization and then out variables through radio broadcasting, and regularly updated, is also a spark on yarn model, is now unable to read update file value, more strange is the spark - submit to submit homework, read out the values in the file or update the value of the former, there is a problem, don't know where I wrote code is as follows:
 
The object BroadcastWrapper {
@ volatile private var broadcast: broadcast [the List [String]]=null
Private var lastUpdatedTime: Date=Calendar. GetInstance. GetTime ()

The Columns in the configuration file is//parsing configuration
Def the getProperties (filePath: String="/home/XXX/test. The properties") : a List [String]={
Val fileStream=new FileInputStream (filePath)
Val prop=new Properties ()
Prop. Load (fileStream)
Val value=https://bbs.csdn.net/topics/prop.getProperty (" columns "). The split (", "). ToList
Println (" value is * * * * * * * * * * * * * * * * * * * "+ value)
The value
}

Def getInstance (sc: SparkContext filePath: String="/home/z672898/LZW/test. The properties") : Broadcast [the List [String]]={
If (broadcast==null) {
Synchronized {
If (broadcast==null)
Broadcast=sc. Broadcast (the getProperties (filePath))
}
}
Broadcast
}

Def updateAndGet (sc: SparkContext, block: Boolean=false, filePath: String="/home/z672898/LZW/test. The properties") : Broadcast [the List [String]]={
Val currentTime=Calendar. GetInstance (). GetTime
//1 min=60 s=60000 ms
Val date_diff=currentTime. GetTime - lastUpdatedTime. GetTime
//3 min update
If (broadcast==null | | date_diff & gt; 60000) {
If (broadcast!=null) {
/* *
* unpersist (blocking) : the broadcasting of variables from the cluster all save the radio work nodes in memory to remove
* blocking parameter specifies the operation of Boolean type is blocked until the variable has been removed from all the nodes, or as an asynchronous non blocking operation.
* if you want to immediately release the memory, should set this parameter to True
*/
Broadcast. Unpersist (block)
}
Val columns=the getProperties (filePath)
Println (" other broadcast: * * * * * * * * * * * * * * * * "+ columns)
Broadcast=sc. Broadcast (columns)
//update time
LastUpdatedTime=Calendar. GetInstance (). GetTime
}
Broadcast
}
//read/write serialized
Def writeObject (out: ObjectOutputStream) : Unit={
Out. WriteObject (broadcast)
}

Def readObject (in: ObjectInputStream) : Unit={
In readObject (). AsInstanceOf [Broadcast [the List [String]]]
}
}

CodePudding user response:

Using radio, this is not possible, you can tell me about one specific needs and see if there is any other solution

CodePudding user response:

Can use variables, but it can only be dstream. ForeachRdd operator can be used inside, otherwise the null pointer exception

CodePudding user response:

Possible, update file from the HDFS, regularly updated daily broadcasting variables
  • Related