Home > Software design >  Kotlin with spark create dataframe from POJO which has pojo classes within
Kotlin with spark create dataframe from POJO which has pojo classes within

Time:10-09

I have a kotlin data class as shown below

data class Persona_Items(
     val key1:Int = 0,
     val key2:String = "Hello")

data class Persona(
    val persona_type: String,
    val created_using_algo: String,
    val version_algo: String,
    val createdAt:Long,
    val listPersonaItems:List<Persona_Items>)


data class PersonaMetaData
    (val user_id: Int,
     val persona_created: Boolean,
     val persona_createdAt: Long,
     val listPersona:List<Persona>)

fun main() {

    val personalItemList1 = listOf(Persona_Items(1), Persona_Items(key2="abc"), Persona_Items(10,"rrr"))
    val personalItemList2 = listOf(Persona_Items(10), Persona_Items(key2="abcffffff"),Persona_Items(20,"rrr"))

    val persona1 = Persona("HelloWorld","tttAlgo","1.0",10L,personalItemList1)
    val persona2 = Persona("HelloWorld","qqqqAlgo","1.0",10L,personalItemList2)
    val personMetaData = PersonaMetaData(884,true,1L, listOf(persona1,persona2))

    val spark = SparkSession
        .builder()
        .master("local[2]")
        .config("spark.driver.host","127.0.0.1")
        .appName("Simple Application").orCreate


    val rdd1: RDD<PersonaMetaData> = spark.toDS(listOf(personMetaData)).rdd()

    val df = spark.createDataFrame(rdd1, PersonaMetaData::class.java)

    df.show(false)
}

When I try to create a dataframe I get the below error. Exception in thread main java.lang.UnsupportedOperationException: Schema for type src.Persona is not supported.

Does this mean that for list of data classes, creating dataframe is not supported? Please help me understand what is missing this the above code.

CodePudding user response:

Well, it works for me out of the box. I've created a simple app for you to demonstrate it check it out here, https://github.com/szymonprz/kotlin-spark-simple-app/blob/master/src/main/kotlin/CreateDataframeFromRDD.kt

you can just run this main and you will see that correct content is displayed. Maybe you need to fix your build tool configuration if you see something scala specific in kotlin project, then you can check my build.gradle inside this project or you can read more about it here https://github.com/JetBrains/kotlin-spark-api/blob/main/docs/quick-start-guide.md

  • Related