Home > other >  InferSchema numpy.float32 PySpark
InferSchema numpy.float32 PySpark

Time:09-13

I'm trying to create a dataframe with array in PySpark, like above, but it returns infer schema error:

data = array([-1.01835623e-01, -2.81103030e-02,  9.39835608e-01,  1.45413309e-01,
        3.11870694e-01,  4.00573969e-01, -2.64698595e-01, -4.19898927e-01,
       -1.18507199e-01, -3.59607369e-01,  4.42910716e-02,  6.56066418e-01,
        2.20986709e-01, -4.60361429e-02, -4.06525940e-01, -2.33521834e-01])

column = ['feature'] 

from pyspark.sql.types import StructType, StructField, LongType
schema = StructType([StructField("feature", LongType(), True)])

dataframe = spark.createDataFrame(data, column, schema)
dataframe.show()

**TypeError: Can not infer schema for type: <class 'numpy.float32'>**

Should I try some transformation using NumPy or anyone has a hint for it?

CodePudding user response:

This DoubleType worked for me.

data = [('1',[-1.01835623e-01, -2.81103030e-02,  9.39835608e-01,  1.45413309e-01,
        3.11870694e-01,  4.00573969e-01, -2.64698595e-01, -4.19898927e-01,
       -1.18507199e-01, -3.59607369e-01,  4.42910716e-02,  6.56066418e-01,
        2.20986709e-01, -4.60361429e-02, -4.06525940e-01, -2.33521834e-01])]

schema = StructType( [StructField("ID",StringType(),True), 
    StructField("feature",ArrayType(DoubleType()),True)])

df =spark.createDataFrame(data, schema)

 --- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
|ID |feature                                                                                                                                                                                                                   |
 --- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
|1  |[-0.101835623, -0.028110303, 0.939835608, 0.145413309, 0.311870694, 0.400573969, -0.264698595, -0.419898927, -0.118507199, -0.359607369, 0.0442910716, 0.656066418, 0.220986709, -0.0460361429, -0.40652594, -0.233521834]|
 --- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
  • Related