What way can make Java daemon directly call the spark, and after receiving the spark to calculate th-CodePudding

What way can make Java daemon directly call the spark, and after receiving the spark to calculate the return value of the directly?
Side at the moment, I can do is to use Java to kafka, kafka give spark again, after the spark calculated back to kafka, a Java program to receive
I do more around
Is there any way to directly call and get the return value?

CodePudding user response:

Actually you this plan is good, if you are a web synchronous interaction with the spark, will block the servlet thread, throughput is limited, unless your cluster is very bunkers, can complete the response in a short time, and the results suggest that is written on the Redis, front desk, query the background will be a request id, this request id is Redis key, and then take the id polling results access interface,
Our side do Spark impromptu query, are SparkContext separated from the Web Service, and through the Zookeeper implementation HA and load balancing, Web Service receives the SQL query request, optimize the SQL, the query ZK obtain a free Driver, and then sent to the Driver SQL execution, the result is written in the book of the HDFS, then request id as the key, HDFS path as the value, write on the Redis,

CodePudding user response:

See what do you think of the spark, if the spark as a service, it can be used to spark the rest of the client, submit a job, if you want the spark as a dependent, integration in the code, either rest or integration, you can write your drvier, return the value of you need,

CodePudding user response:

reference 1st floor link0007 response:

actually you this plan is good, if you are a web synchronous interaction with the spark, will block the servlet thread, throughput is limited, unless your cluster is very bunkers, can complete the response in a short time, and the results suggest that is written on the Redis, front desk, query the background will be a request id, this request id is Redis key, and then take the id polling results access interface,
Our side do Spark impromptu query, are SparkContext separated from the Web Service, and through the Zookeeper implementation HA and load balancing, Web Service receives the SQL query request, optimize the SQL, the query ZK obtain a free Driver, and then sent to the Driver SQL execution, the result is written in the book of the HDFS, then request id as the key, HDFS path as the value, wrote on Redis,

Warrior can you tell me in detail, thank you, have a case for reference?

CodePudding user response:

reference guoyj520 reply: 3/f

Quote: refer to 1st floor link0007 response:

Actually you this plan is good, if you are a web synchronous interaction with the spark, will block the servlet thread, throughput is limited, unless your cluster is very bunkers, can complete the response in a short time, and the results suggest that is written on the Redis, front desk, query the background will be a request id, this request id is Redis key, and then take the id polling results access interface,
Our side do Spark impromptu query, are SparkContext separated from the Web Service, and through the Zookeeper implementation HA and load balancing, Web Service receives the SQL query request, optimize the SQL, the query ZK obtain a free Driver, and then sent to the Driver SQL execution, the result is written in the book of the HDFS, then request id as the key, HDFS path as the value, wrote on Redis,

Warrior can you tell me in detail, thank you, have a case for reference?

Probably is:
1, write a Spark Driver, we call them SparkJobServer, deployment pattern is Yarn client, open the DynamicAllocation characteristics, namely dynamic application Executor according to actual needs, and then started, under the certain path of ZK, create a PERSISTENT_SEQUENTIAL type node (see: http://blog.csdn.net/heyutao007/article/details/38741207), get yourself an ID, then under the new of this node, set up its own IP, Socket port, and load value (how many SQL in the current run), and other information, start a Socket Server, monitor the port said above, receives the SQL requests, optimize SQL, then through SQLContext. SQL execution, load value + 1, after the completion of the execution - 1, the results on request message definition of HDFS directory, and in setting the corresponding redis key value, also has a detail is in using ShutdownHook, in the end of the Driver accident, delete node information about themselves ZK, realize the automatic logoff, this SparkJobServer will start multiple in different machine, in order to realize the HA and load balancing,
2, Web backend receive front-end SQL, coming through the UUID generated a request Key, under the ZK specified path, find the minimum value and the load SparkJobServer are available, and send it to specify the format of the TCP/IP packet, and then put the Key back to the front end,
3, hold the key to the front polling another rest interface, the rest interface to query redis directly, if to get the value, is the outcome of SparkJobServer ran out on HDFS file path, directly read the file, returned to the front end, if not value, front-end regularly polling until to the value,
4, the other will have a background service to monitor the progress of SparkJobServer, can manually start or offline SparkJobServer,
We this project is through incremental data, 10 t, ad-hoc query number 1 w + test framework, can according to their own needs to modify,

CodePudding user response:

The great god,
1, a program that SparkJobServer isn't an offline, how to maintain multiple SparkJobServer is launching state?
2, how to keep each in zk SparkJobServer corresponding metadata values (own IP, Socket port, and load value)?

CodePudding user response:

reference 5 floor u012540384 reply:

great god,
1, a program that SparkJobServer isn't an offline, how to maintain multiple SparkJobServer is launching state?
2, how to keep each in zk SparkJobServer corresponding metadata values (own IP, Socket port, and load value)?

Has been exist, and 1 Driver die, SparkJobServer SparkContext is thread-safe, each Driver will open several threads to handle front coming of SQL queries,
2, use zk API! ZK is to service registry found

CodePudding user response:

Hello, using Java to kafka, kafka give spark again, after the spark calculated back to kafka, a Java program to receive again, examples of this type can you share with me, let me learn under the ah, thank you,

CodePudding user response:

The original poster hello, using Java to kafka, kafka give spark again, after the spark calculated back to kafka, a Java program to receive again, examples of this type can you share with me, let me learn under the ah, thank you,

CodePudding user response:

The Spark Thrift Server, you can through the JDBC connection

CodePudding user response:

reference 4 floor LinkSe7en response:

Quote: reference guoyj520 reply: 3/f

Quote: refer to 1st floor link0007 response:

Actually you this plan is good, if you are a web synchronous interaction with the spark, will block the servlet thread, throughput is limited, unless your cluster is very bunkers, can complete the response in a short time, and the results suggest that is written on the Redis, front desk, query the background will be a request id, this request id is Redis key, and then take the id polling results access interface,
Our side do Spark impromptu query, are SparkContext separated from the Web Service, and through the Zookeeper implementation HA and load balancing, Web Service receives the SQL query request, optimize the SQL, the query ZK obtain a free Driver, and then sent to the Driver SQL execution, the result is written in the book of the HDFS, then request id as the key, HDFS path as the value, wrote on Redis,

Warrior can you tell me in detail, thank you, have a case for reference?

Probably is:
1, write a Spark Driver, we call them SparkJobServer, deployment pattern is Yarn client, open the DynamicAllocation characteristics, namely dynamic application Executor according to actual needs, and then started, under the certain path of ZK, create a PERSISTENT_SEQUENTIAL type node (see: http://blog.csdn.net/heyutao007/article/details/38741207), get yourself an ID, then under the new of this node, set up its own IP, Socket port, and load value (how many SQL in the current run), and other information, start a Socket Server, monitor the port said above, receives the SQL requests, optimize SQL, then through SQLContext. SQL execution, load value + 1, after the completion of the execution - 1, the results on request message definition of HDFS directory, and in setting the corresponding redis key value, also has a detail is in using ShutdownHook, in the end of the Driver accident, delete node information about themselves ZK, realize the automatic logoff, this SparkJobServer will start multiple in different machine, in order to realize the HA and load balancing,
2, Web backend receive front-end SQL, coming through the UUID generated a request Key, under the ZK specified path, find the minimum value and the load SparkJobServer are available, and send it to specify the format of the TCP/IP packet, and then put the Key back to the front end,
nullnullnullnullnull