Home > Net >  How to change the schima of the spark dataframe
How to change the schima of the spark dataframe

Time:01-19

Im reading a JSON file with spark.read.json and it auotmatically gives me the dataframe with schema but is it possible to change the schema of exisiting Dataframe with the below schema?

schema = StructType([StructField("_links", MapType(StringType(), MapType(StringType(), StringType()))),
                     StructField("identifier", StringType()),
                     StructField("enabled", BooleanType()),
                     StructField("family", StringType()),
                     StructField("categories", ArrayType(StringType())),
                     StructField("groups", ArrayType(StringType())),
                     StructField("parent", StringType()),
                     StructField("values", MapType(StringType(), ArrayType(MapType(StringType(), StringType())))),
                     StructField("created", StringType()),
                     StructField("updated", StringType()),
                     StructField("associations", MapType(StringType(), MapType(StringType(), ArrayType(StringType())))),
                     StructField("quantified_associations", MapType(StringType(), IntegerType())),
                     StructField("metadata", MapType(StringType(), StringType()))])

CodePudding user response:

Once you have schema defined (as in the answer), you can use it to read the data this way:

df = spark.read.json('path_to_json', schema=schema)
  • Related