Big data real-time analysis architecture implementation?-CodePudding

CodePudding user response:

If Spark didn't to do data processing, why don't you directly MySQL master-slave synchronization, in statistical analysis from the library?

CodePudding user response:

1. The amount of data in the future will certainly be more and more big
2. The data processing, according to the business scenario, real-time measure of development will still use the spark, directly with mysql index development does not conform to the business development in the future

CodePudding user response:

refer to the second floor weixin_45456985 response:

1. The amount of data in the future will certainly be more and more
2. The data processing, according to the business scenario, real-time measure of development will still use the spark, directly with mysql index development does not conform to the business development in the future

In this case, the mainstream approach is two consumption, kafka is a real-time calculation index update MySQL; One is the direct trading, go offline for several warehouse ETL,
Because of real-time computing focuses on performance, general use HyperLogLog'll lose accuracy but a fast method of statistics,
Accurate statistical data, need to offline several positions, batch run regularly, to modify the result of real-time computing,
Early real-time flow hold lived, offline result can do the real-time results are checked first, and not fixed, if the error is bigger and bigger, need to offline intervention correction results,

CodePudding user response:

1: mysql - - binlog - kafka - maxwell hbaseapi - hbase - oozie - hive - sqoop - mysql - view
2: mysql - maxwell - the binlog - kafka - logstash - es - sparkstreaming - mysql - view
3: mysql - maxwell - the binlog - kafka - sparkstreaming - mysql - view
4: mysql for library - sparkstreaming - mysql - view

There is always a you like,