I'm trying to create a dataframe with array in PySpark, like above, but it returns infer schema error:
data = array([-1.01835623e-01, -2.81103030e-02, 9.39835608e-01, 1.45413309e-01,
3.11870694e-01, 4.00573969e-01, -2.64698595e-01, -4.19898927e-01,
-1.18507199e-01, -3.59607369e-01, 4.42910716e-02, 6.56066418e-01,
2.20986709e-01, -4.60361429e-02, -4.06525940e-01, -2.33521834e-01])
column = ['feature']
from pyspark.sql.types import StructType, StructField, LongType
schema = StructType([StructField("feature", LongType(), True)])
dataframe = spark.createDataFrame(data, column, schema)
dataframe.show()
**TypeError: Can not infer schema for type: <class 'numpy.float32'>**
Should I try some transformation using NumPy or anyone has a hint for it?
CodePudding user response:
This DoubleType worked for me.
data = [('1',[-1.01835623e-01, -2.81103030e-02, 9.39835608e-01, 1.45413309e-01,
3.11870694e-01, 4.00573969e-01, -2.64698595e-01, -4.19898927e-01,
-1.18507199e-01, -3.59607369e-01, 4.42910716e-02, 6.56066418e-01,
2.20986709e-01, -4.60361429e-02, -4.06525940e-01, -2.33521834e-01])]
schema = StructType( [StructField("ID",StringType(),True),
StructField("feature",ArrayType(DoubleType()),True)])
df =spark.createDataFrame(data, schema)
--- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|ID |feature |
--- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|1 |[-0.101835623, -0.028110303, 0.939835608, 0.145413309, 0.311870694, 0.400573969, -0.264698595, -0.419898927, -0.118507199, -0.359607369, 0.0442910716, 0.656066418, 0.220986709, -0.0460361429, -0.40652594, -0.233521834]|
--- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------