Home > database >  Derive structType schema from list of column names in PySpark
Derive structType schema from list of column names in PySpark

Time:06-02

In PySpark, I don't want to hardcode the schema definition, I want to derive the schema from below variable.

mySchema=[("id","IntegerType()", True),
          ("name","StringType()", True),
          ("InsertDate","TimestampType()", True)
         ]

result = mySchema.map(lambda l: StructField(l[0],l[1],l[2]))

How do I achieve this logic to generate the structTypeSchema from mySchema?

Expected output:

structTypeSchema = StructType(fields=[
                                      StructField("id", IntegerType(), True),
                                      StructField("name", StringType(), True), 
                                      StructField("InsertDate",TimestampType(), True)])

CodePudding user response:

You can try something along these lines:

from pyspark.sql import types as T

structTypeSchema = T.StructType(
    [T.StructField(f[0], eval(f'T.{f[1]}'), f[2]) for f in mySchema]
)

or

from pyspark.sql.types import *
                                       
structTypeSchema = StructType(
    [StructField(f[0], eval(f[1]), f[2]) for f in mySchema]
)
  • Related