Home > Software design >  update row value using another table in spark java
update row value using another table in spark java

Time:07-22

I have 2 datasets in java spark, first one contains id, name , age and second one are the same, i need to check values (name and id) and if it's similar update the age with new age in dataset2

i tried all possible ways but i found that java in spark don't have much resourses and all possible ways not worked

this is one i tried :

        dataset1.createOrReplaceTempView("updatesTable");
        datase2.createOrReplaceTempView("carsTheftsFinal2");
        updatesNew.show();
          Dataset<Row> test = spark.sql( "ALTER carsTheftsFinal2 set carsTheftsFinal2.age = updatesTable.age from updatesTable where carsTheftsFinal2.id = updatesTable.id AND carsTheftsFinal2.name = updatesTable.name ");
         test.show(12);

and this is the error :

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'ALTER carsTheftsFinal2'(line 1, pos 6)

I have hint: that i can use join to update without using update statement ( java spark not provide update )

CodePudding user response:

Assume that we have ds1 with this data:

 --- ----------- --- 
| id|       name|age|
 --- ----------- --- 
|  1|    Someone| 18|
|  2|       Else| 17|
|  3|SomeoneElse| 14|
 --- ----------- --- 

and ds2 with this data:

 --- ---------- --- 
| id|      name|age|
 --- ---------- --- 
|  1|   Someone| 14|
|  2|      Else| 18|
|  3|NotSomeone| 14|
 --- ---------- --- 

According to your expected result, the final table would be:

 --- ----------- --- 
| id|       name|age|
 --- ----------- --- 
|  3|SomeoneElse| 14| <-- not modified, same as ds
|  1|    Someone| 14| <-- modified, was 18
|  2|       Else| 18| <-- modified, was 17
 --- ----------- --- 

This is achieved with the following transformations, first, we rename ds2's age with age2.

val renamedDs2 = ds2.withColumnRenamed("age", "age2")

Then:

// we join main dataset with the renamed ds2, now we have age and age2
ds1.join(renamedDs2, Seq("id", "name"), "left")
  // we overwrite age, if age2 is not null from ds2, take it, otherwise leave age
  .withColumn("age", 
    when(col("age2").isNotNull, col("age2")).otherwise(col("age"))
  )
  // finally, we drop age2
  .drop("age2")

Hope this does what you want!

  • Related