Home > OS >  java.lang.String is not valid external type for schema of string
java.lang.String is not valid external type for schema of string

Time:10-12

Given this, seem have done this in the past ok, but...:

val arrayStructData2 = Seq(
      Row("James", 2),
      Row("Alex", 3)
    )

 val arrayStructSchema2 = new StructType()
                            .add("names",new StructType()
                                 .add("name", StringType)
                                 .add("extraField", IntegerType)
                                )

val df = spark.createDataFrame(spark.sparkContext.parallelize(arrayStructData2),arrayStructSchema2)
df.printSchema()
df.show()

I get this:

...
Caused by: RuntimeException: java.lang.String is not a valid external type for schema of struct<name:string,extraField:int>

Can't see it immediately.

CodePudding user response:

For others, as a reminder, needed Row(Row... as in:

val arrayStructData2 = Seq(
      Row(Row("James", 2)),
      Row(Row("Alex", 3))
    )

Not so obvious error imho.

CodePudding user response:

When you create the DataFrame with createDataFrame you register the schema, but nothing is actually evaluated which is why df.printSchema works as expected. When you execute df.show the DataFrame is evaluated and Spark tries to load the first value you have given it (in this case a String) into a struct which results in the runtimeException you're seeing. Here is the scaladoc for Spark 3.1.1:

Creates a DataFrame from a java.util.List containing Rows using the given schema. It is important to make sure that the structure of every Row of the provided List matches the provided schema. Otherwise, there will be runtime exception.

It's telling you that you are trying to force a string into a struct.

  • Related