Home > other >  Spark onehot multiple characteristics, how to do high efficiency?
Spark onehot multiple characteristics, how to do high efficiency?

Time:09-21

Spark onehot multiple characteristics, I have more than 50 features needs to be done onehot processing, high efficiency how to do?

CodePudding user response:

The import of sc. Implicits. _
Val vectorData=https://bbs.csdn.net/topics/dataRDD
//enumeration values can be converted to a Double
The map (x=& gt; (enum2Double (" has lost ", x. _1), x. _2 (0), x. _2 (1), x. _2 (2), x. _2 (3)))
//ml. Feature. LabeledPoint
ToDF (" loss ", "gender", "age", "grade", "region")

//indexing columns
Val stringColumns=Array (" gender ", "age", "grade", "region")
Val index_transformers: Array [org. Apache. Spark. Ml. PipelineStage]=stringColumns. The map (
Cname=& gt; New StringIndexer ()
. SetInputCol (cname)
SetOutputCol (s "${cname} _index")
)
//Add the rest of your pipeline like VectorAssembler and algorithm
Val index_pipeline=new Pipeline (.) setStages (index_transformers)
Val index_model=index_pipeline. Fit (vectorData)
Val df_indexed=index_model. Transform (vectorData)

//encoding columns
Val indexColumns=df_indexed. Columns. The filter (x=& gt; X the contains "index")
Val one_hot_encoders: Array [org. Apache. Spark. Ml. PipelineStage]=indexColumns. The map (
Cname=& gt; New OneHotEncoder ()
. SetInputCol (cname)
SetOutputCol (s "${cname} _vec")
)

Val pipeline=new pipeline (.) setStages (index_transformers + + one_hot_encoders)
Val model=pipeline. Fit (vectorData)

Model. The transform (vectorData). Select (" loss ", "gender_index_vec", "age_index_vec", "grade_index_vec", "region_index_vec")
. The map (
X=& gt;
Ml. Feature. LabeledPoint (x.a pply (0). The toString (). ToDouble, ml. Linalg. Vectors. The dense (x.g etAs [SparseVector] (" gender_index_vec "). ToArray++ x.g etAs [SparseVector] (" age_index_vec "). ToArray++ x.g etAs [SparseVector] (" grade_index_vec "). ToArray++ x.g etAs [SparseVector] (" region_index_vec "). The toArray))
)
Source:
http://blog.csdn.net/pan_haufei/article/details/72903667
I wish success
  • Related