Home > other >  The Spark RDD into DataFrame problem
The Spark RDD into DataFrame problem

Time:09-19

I want to convert RDD to dataframe, if is the Person type code can perform, but I want to use a map or a json to encapsulate data, don't want to use the specific type
But changing the map after prompt is unusual, I want to ask next in what way can use the format of the data,

Java. Lang. ClassCastException: org. Apache. Spark. SQL. Types. The MapType always be cast to org. Apache. Spark. SQL. Types. StructType

My program code
JavaRDD PersonsRDD=myRDD. The map (new Function Map () {
@ Override
Public Map Call (Tuple2 & lt; ImmutableBytesWritable Result> A tuple) throws the Exception {
//TODO Auto - generated method stub
Result the Result=tuple. _2 ();
String rowkey=Bytes. ToString (result. GetRow ());
String name=Bytes. ToString (result. GetValue (Bytes. ToBytes (" data "), Bytes. ToBytes (" name ")));
//String type=Bytes. ToString (result. GetValue (Bytes. ToBytes (" data "), Bytes. ToBytes (" type ")));
String age=Bytes. ToString (result. GetValue (Bytes. ToBytes (" data "), Bytes. ToBytes (" age ")));
//this can be directly into row type
//the Person p=new Person ();
//p. etId (rowkey);
//p. etName (name);
//p. etAge (age);
//return p;
Map The map=new HashMap (a);
The map. The put (" id ", rowkey);
The map. The put (" name ", name);
The map. The put (" age ", the age);
Return the map;
}

});
DataFrame df=sc createDataFrame (personsRDD, Map. Class);


CodePudding user response:

To create a structure description information object StructType, can not only use the Map,
  • Related