Background: Recently the company needs to build large data center, data center to build in A city, data from each application system, which located in B, C, D, now need to be located in A, B, C, the subsystem of data (dispersed in each table) to A data center for data processing, A, B, C of each subsystem of data stored in the MSSQL,
Because before you didn't do too much data processing and building data center, lack of experience, now have the following questions, also please tell everybody, thank you very much! 1, the transmission of data from the application system to the data center, should adopt what technology, 2, the data center of the machine's operating system, data storage, data processing, cluster management, should adopt what systems and technology, and how to structure,
Please everybody to give directions, if you feel trouble, as long as it is pointed out that where need what technology can, please, thank you very much!
CodePudding user response:
Because before you didn't do too much data processing and building data center, lack of experience, now have the following questions, also please tell everybody, thank you very much! 1, the transmission of data from the application system to the data center, should adopt what technology, 2, the data center of the machine's operating system, data storage, data processing, cluster management, should adopt what systems and technology, and how to structure,
Please everybody to give directions, if you feel trouble, as long as it is pointed out that where need what technology can, please, thank you very much!
CodePudding user response:
Transfer do not understand, but the store can use HDFS, MSSQL data can, by using the method of master-slave replication in a copy of the data center set up, and then by Sqoop guide into parquet file format in the HDFS, and data analysis through the hive/spark visit upper big data applications, such as Server logs can through the flume HDFS collected, and then through the ELK (Logstash and Elasticsearch, Kibana) is analyzed, but it is before we flume sink directly to HBase, use a Spark to access (Spark our data analysis basic around) and analysis,
CodePudding user response:
And if it is can use ambari to build large-scale cluster and monitoring, ambari automatically help you assemble Hortonworks distribution of Hadoop (HDP), can also according to the assembly to other ecological components such as hbase Hadoop hive ZooKeeper spark and so on, but according to my actual it compare with centos compatible, this is about to consider the problem that commonly used Linux,,, If really cow force data center can also build a private cloud cloud (OpenStark) and container (Docker), but this is completely don't understand