Py4j. Protocol. Py4JJavaError: An error occurred while calling z: org. Apache. Spark. API. Python. PythonRDD. NewAPIHadoopRDD.
: Java. Lang. ClassNotFoundException: org). Apache hadoop, hbase. IO. ImmutableBytesWritable
The from pyspark import SparkContext, SparkConf
The import OS
OS. Environ [' JAVA_HOME]='D: \ Java \ jdk1.8.0 _92'
The conf=SparkConf (.) setMaster (" local "). SetAppName (" spark_hbase_test ")
Sc=SparkContext (conf=conf)
The host='devhadoop3.reachauto.com, devhadoop2.reachauto.com, devhadoop1.reachauto.com
Table='2: IndexMessage'
The conf={" hbase. Zookeeper. Quorum ": the host," hbase. Graphs. Inputtable ": table}
KeyConv="org. Apache. Spark. Examples. Pythonconverters. ImmutableBytesWritableToStringConverter"
ValueConv="org. Apache. Spark. Examples. Pythonconverters. HBaseResultToStringConverter"
Hbase_rdd=sc. NewAPIHadoopRDD (". Org. Apache hadoop, hbase graphs. TableInputFormat ",
"Org. Apache hadoop. Hbase. IO. ImmutableBytesWritable",
". Org. Apache hadoop, hbase client. The Result ", keyConverter=keyConv, valueConverter=valueConv,
The conf=conf)
Count=hbase_rdd
Print (count)
CodePudding user response:
Lack of hbase corresponding package, did you find python hbase installation package, install, and then try connect directly by python hbase is feasible;Feasible to use pyspark should be no problem
CodePudding user response:
https://www.cnblogs.com/junle/p/7611540.html may be usefulCodePudding user response:
Python dock with the big data, need to configure the env variable SPARK_HOME or HADOOP_HOME there are corresponding jar package,CodePudding user response: