I try to extract different tables from REST API in PySpark. I followed this link. I want to store the different schema in one column in a pyspark dataframe. Here is an example:
import pyspark.sql.functions as F
from pyspark.sql import Row
from pyspark.sql.types import *
A = [{"TableName": "Table1", "Schema": StructType([StructField("a", StringType()), StructField("b", IntegerType())])}
, {"TableName": "Table2", "Schema": StructType([StructField("b", StringType()), StructField("c", IntegerType())])}]
df_A = spark.createDataFrame(A)
I get the following error:
ValueError: Some of types cannot be determined after inferring
Is it possible to achieve this result?
CodePudding user response:
When we use different data type like StructType
or StringType
in spark, we are trying to define how the value in dataframe or column looks like, it's a description and definition but not a value. Therefore, you can't save it as a value
inside the column.
If you really want to save the schema of different table, why don't you save it as a string?
A = [{"TableName": "Table1", "Schema": """StructType([StructField("a", StringType()), StructField("b", IntegerType())])"""}
, {"TableName": "Table2", "Schema": """StructType([StructField("b", StringType()), StructField("c", IntegerType())])"""}]
df_A = spark.createDataFrame(A)
----------------------------------------------------------------------------- ---------
|Schema |TableName|
----------------------------------------------------------------------------- ---------
|StructType([StructField("a", StringType()), StructField("b", IntegerType())])|Table1 |
|StructType([StructField("b", StringType()), StructField("c", IntegerType())])|Table2 |
----------------------------------------------------------------------------- ---------
Then you can parse your schema when you create your own UDF.