CSDN online training Q/A Amazon Kinesis best practice sharing real-time data analysis-CodePudding

January 8, Amazon AWS ZhuangFu product development manager as the theme of "Amazon Kinesis real-time data analysis of best practice sharing" in the online training, tells the story of how to use the Kinesis architecture for us real-time data stream processing and analysis ability, and by far the most popular mobile game developers Supercell customer case, how to reflect Kinesis processing and analysis of huge amounts of data flow (for example, the user clicks, consumption, online etc.),

In order to help you better review the related content of the training, to learn how to in the AWS cloud platform for data analysis, CSDN compiled this training last QA, as follows:

Q1. The development of community in storms, slow, many companies prefer a Spark memory computing framework, AWS have any consideration for the Spark plan?

A: Kinesis is for real-time analysis of Storm, Spark memory computing a framework, Storm and Spark or a little different, do real-time data stream processing, if you want to use open source, of course, with the Storm; If you don't want you to do at the bottom of the operations or set up a similar Storm such a framework, then use kinesis. If you want to use the Spark memory computing framework, the Spark is used to replace hadoopMapReduce a framework, for the AWS, EMR is to support the Spark, so if you are interested in the Spark, you can see the AWS EMR solution, because the Spark a lot of experiments and architecture are developed on the AWS cloud, so we support for the Spark is actually quite good,

Q2. According to you, then EC2 is to collect the data? But I see on Web Service, said EC2 is calculated, the trouble you solve, thank

A: just in front, I said EC2 is used to collect data, then why did this with Kinesis? Is in fact the two have the same place, if there is no such packaging Kinesis under conventional good service, you need to find their own server, the server setup, EC2 is a computing resources, is a server, you will have to buy such a server, to set up a lot of nodes to do data collection, you will probably make the deployment tool on data collection, you can use the EC2, using tools to do some of the open source community, this is, of course, no problem, but, if you want to save yourself, some of your operations below platform, Kinesis is a better solution,

Q3. Why read speed is faster than the speed of writing, but the record number is less so much?

Answer: why have different throughput difference, I want to explain in two aspects: one, can deal with 1 m throughput per second, per second written record is one thousand, but read data, you can see the total throughput is 2 m/S, everyone feel strange why only 1 SEC support 50 throughput, but generally speaking, a written record is a small some KB, may be one of my record is 2 KB or 3 KB, the data quantity is big, need to write a second TBS to soon, I don't need a read to read out a lot, but read the total amount of collection to the large amount of read out again as a stage, does not need to read the size of TBS, that is to read and write a difference,

Q4. I want to collect data for a period of time to form a time window, inside the Kinesis to do? I in memory cache data Kinesis to ensure not lose?

Answer: have mentioned just now, can came in front of the data in the data form a time window, then data accumulated over a period of time to a quantity, and then read it out, usually is the right thing, you will usually wait for a certain time window, collect data, you will read it for processing, so simple, you will put the kinesis as a buffer, a buffer, I can guarantee that the data in the buffer will not be lost, we are in the midst of a three data center for data backup, so you won't worry about the data will be lost, the only thing you should pay attention to is that I the data window is only 24 hours, so you can't wait until after 24 hours of time to deal with, because we can only keep the data window for 24 hours, this is very important!

Q5 Kinesis if can realize the function of real-time Join?

A: a lot of questions asked, how do we realize the data processing on the different applications, in fact this is a logical problem, I just mentioned, Kinesis after the data is read out, you need to write the application to the collected data for processing, to write such an application we support now is use the language to write Java and Python, Java and Python function can use some simple calculation, so, or if you want to do addition, subtraction, multiplication, and division of a data aggregation, analysis, you can through the application of internal logic to handle, all of these are controlled by the written application to,

Q6. Compared with the Storm, the framework of each shard in Kinesis shard, whether similar tuple tuple, can be defined by the worker/blot processing format?

A: some said Strom architecture will also do some of the design of the shard, but processing format terms, now actually... I don't know what is the "processing format" the friend asked, because processing format is in the front, the six you want how to do it, you back to how to deal with, to define, are we just told by the Worker to do, shard is used as a buffer of data flow,

Q7. Use the Redshift, if you want to online expansion, the need to stop to write data in the new cluster, now what method can achieve the expansion of business transparency, speaking, reading and writing? Also Redshift is only one Leader do control node node, whether there is the possibility of a single point of failure?

Answer: many interested friends asked, real-time data collected with kinesis, processing later on as Redshift, is supported non-stop to do an online expansion, can do an online support non-stop expansion, the Redshift is a very good service, the traditional physical server, if it is done using the data warehouse, five nodes may be extended to 10 nodes, you need to do to a stop, in addition to buy 5 machine to do an expansion, expansion and traditional physical machine downtime to do so, but you don't have to stop for Redshift we can do for you automatically expansion,

Q8. Glacier archived data, but also can restore? Glacier can support customers directly to your own data disk array (NAS) mailed to AWS, when use directly attached to import the S3 storage system?

Answer: in terms of import to the Glacier, Glacier before it can be regarded as a cold standby service, can put it on one side to do a backup file, also can take out from the archive data again, because of the cold standby data but take out time to 3-5 hours to get out, so you need a point in time,

Q9. Kinesis can handle how much capacity data?

A: how much someone asked that can handle most Kinesis capacity data, in fact, I have just talked about, see you have how many the shard, each shard, namely throughput in every second, and is is 1 MB per second, so you can according to the demands of the front-end throughput to expand a shard, shard two shard or ten or one hundred shard... Is according to the capacity of you (to expand), the benefits of this is that we don't have too big an limit how much you a shard, if you have such a demand, howare we will have a capacity,

Q10. The Storm and Kinesis can be integrated together?

Answer: simply Kinesis including the function of the Storm, so you use Kinesis tantamount to replace the Storm, but if you are using Storm to build something, it is no problem, Kinesis can also do, real-time data acquisition at leading end is a backend calculation can also be to Storm, so we support the Kinesis followed by Storm, this is no problem, of course, behind the Kinesis DynamoDB can also, we have a connector, that is very simple () can be reached, the Kinesis of processed data can be directly imported into no DynamoDB,

Students listen to the teacher carefully, may wish to try for themselves, under the experiment guide can be used as a reference,

Recommended article: to guide: use Amazon Kinesis implement real-time visualization of geographic data

CodePudding user response:

Thank you, have a lot of harvest

CodePudding user response:

Is to change it,,,

CodePudding user response:

How to make use of Kinesis architecture real-time data stream processing and analysis ability

CodePudding user response:

Learning, thanks for sharing,