//read the data into RDD
Val hBaseRDD=sc. NewAPIHadoopRDD (conf, classOf [TableInputFormat],
ClassOf [org. Apache hadoop. Hbase. IO. ImmutableBytesWritable],
ClassOf [. Org. Apache hadoop, hbase client. The Result])
Val count=hBaseRDD. The count ()
Println (count)
HBaseRDD. Foreach {case (_, result)=& gt; {
//get line key
Val key=Bytes. ToString (result. GetRow)
//by column and column names for columns
Val name=Bytes. ToString (result. GetValue (" A ". GetBytes, "name" getBytes))
Val age=Bytes. ToInt (result. GetValue (" A ". GetBytes, "age" getBytes))
Println (" Row key ":" + key + "FileName:" + name + "age:" + age)
}}
Execute successfully, but failed to print out the detailed information of each of the data, the log is as follows:
18/05/14 16:26:50 INFO DAGScheduler: ResultStage 0 (the count at the test. The scala: 64) finished in 2.515 s
18/05/14 16:26:50 INFO DAGScheduler: Job 0 finished: count the at test. The scala: 64, took 2.642359 s
and
18/05/14 16:26:50 INFO SparkContext: Starting job: foreach ats test. The scala: 71
18/05/14 16:26:50 INFO DAGScheduler: Got job 1 (foreach ats test. The scala: 71) with 1 output partitions
18/05/14 16:26:50 INFO DAGScheduler: Final stage: ResultStage 1 (foreach ats test. The scala: 71)
18/05/14 16:26:50 INFO DAGScheduler: Parents of final stage: the List ()
18/05/14 16:26:50 INFO DAGScheduler: Missing parents: List ()
18/05/14 16:26:50 INFO DAGScheduler: date ResultStage 1 (NewHadoopRDD [0] at newAPIHadoopRDD ats test. The scala: 60), which has no missing parents
18/05/14 16:26:50 INFO MemoryStore: Block broadcast_2 stored as values in the memory (estimated size 2.1 KB, free 897.2 MB)
18/05/14 16:26:50 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1334.0 B, 897.2 MB) free
18/05/14 16:26:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.251.6.153:56001 (size: 1334.0 B, free: 897.6 MB)
18/05/14 16:26:50 INFO SparkContext: Created broadcast from 2 broadcast at DAGScheduler. Scala: 1006
18/05/14 16:26:50 INFO DAGScheduler: date 1 missing from ResultStage 1 (NewHadoopRDD [0] at newAPIHadoopRDD ats test. The scala: 60) (the first 15 tasks are for partitions Vector (0))
18/05/14 16:26:50 INFO TaskSchedulerImpl: Adding task set with 1.0 1 tasks
18/05/14 16:26:50 INFO TaskSetManager: Starting task in stage 0.0 1.0 (dar 1, 10.124.130.14, executor 2, partition 0, ANY, 4919 bytes)
18/05/14 16:26:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.124.130.14:54410 (size: 1334.0 B, free: 366.3 MB)
18/05/14 16:26:51 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.124.130.14:54410 (size: 29.8 KB, free: 366.3 MB)
18/05/14 16:26:52 INFO TaskSetManager: Finished the task in stage 0.0 1.0 (dar) 1 in 2453 ms on 10.124.130.14 executor (2) (1/1)
18/05/14 16:26:52 INFO TaskSchedulerImpl: Removed the TaskSet 1.0, whose tasks have all completed, the from the pool
18/05/14 16:26:52 INFO DAGScheduler: ResultStage 1 (foreach ats test. The scala: 71) finished in 2.454 s
18/05/14 16:26:52 INFO DAGScheduler: Job 1 finished: foreach ats test. The scala: 71, took 2.469872 s
Red font is the total number of rows HBASE table, but there is no any data the following foreach printed, checked for several days didn't find the problem
God, please grant instruction
CodePudding user response:
{case (_, result)=& gt; {Is what the devil, you don't what values in the result, what do you value
CodePudding user response: