Home > other >  The spark dataframe persist (disk_only) after data is 0
The spark dataframe persist (disk_only) after data is 0

Time:09-25

Characteristics of the data read from a hive data table, in order to use minmaxscale normalized operation, the need to transform characteristics into the vector type, have the code:
 DiskLevel=StorageLevel. DISK_ONLY 
Udfunction=udf (lambda column: Vectors. Dense (column), VectorUDT ())
Spark. SQL (" use itemRecommend ")
OriginalFeatures=spark. SQL (" select * from feature_table ")
The columns=OriginalFeatures. Columns
VectorFeatures=OriginalFeatures
I=0
For the column in the columns:
If the column!="tag" :
I=I + 1
Print (column)
VectorFeatures=VectorFeatures. WithColumn (column, udfunction (VectorFeatures/column))
# VectorFeatures. Persist (storageLevel=DiskLevel)
If I==20:
I=0
VectorFeatures. Persist (storageLevel=DiskLevel)
VectorFeatures. The count ()
Break

Due to the characteristics of many properties, so want to once every 20 attribute persit, but every time after persist VectorFeatures data are turned into 0, very confusing, and guidance,
  • Related