DiskLevel=StorageLevel. DISK_ONLY
Udfunction=udf (lambda column: Vectors. Dense (column), VectorUDT ())
Spark. SQL (" use itemRecommend ")
OriginalFeatures=spark. SQL (" select * from feature_table ")
The columns=OriginalFeatures. Columns
VectorFeatures=OriginalFeatures
I=0
For the column in the columns:
If the column!="tag" :
I=I + 1
Print (column)
VectorFeatures=VectorFeatures. WithColumn (column, udfunction (VectorFeatures/column))
# VectorFeatures. Persist (storageLevel=DiskLevel)
If I==20:
I=0
VectorFeatures. Persist (storageLevel=DiskLevel)
VectorFeatures. The count ()
Break
Due to the characteristics of many properties, so want to once every 20 attribute persit, but every time after persist VectorFeatures data are turned into 0, very confusing, and guidance,