I have 2 datasets in java spark, first one contains id, name , age and second one are the same, i need to check values (name and id) and if it's similar update the age with new age in dataset2
i tried all possible ways but i found that java in spark don't have much resourses and all possible ways not worked
this is one i tried :
dataset1.createOrReplaceTempView("updatesTable");
datase2.createOrReplaceTempView("carsTheftsFinal2");
updatesNew.show();
Dataset<Row> test = spark.sql( "ALTER carsTheftsFinal2 set carsTheftsFinal2.age = updatesTable.age from updatesTable where carsTheftsFinal2.id = updatesTable.id AND carsTheftsFinal2.name = updatesTable.name ");
test.show(12);
and this is the error :
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'ALTER carsTheftsFinal2'(line 1, pos 6)
I have hint: that i can use join to update without using update statement ( java spark not provide update )
CodePudding user response:
Assume that we have ds1
with this data:
--- ----------- ---
| id| name|age|
--- ----------- ---
| 1| Someone| 18|
| 2| Else| 17|
| 3|SomeoneElse| 14|
--- ----------- ---
and ds2
with this data:
--- ---------- ---
| id| name|age|
--- ---------- ---
| 1| Someone| 14|
| 2| Else| 18|
| 3|NotSomeone| 14|
--- ---------- ---
According to your expected result, the final table would be:
--- ----------- ---
| id| name|age|
--- ----------- ---
| 3|SomeoneElse| 14| <-- not modified, same as ds
| 1| Someone| 14| <-- modified, was 18
| 2| Else| 18| <-- modified, was 17
--- ----------- ---
This is achieved with the following transformations, first, we rename ds2
's age
with age2
.
val renamedDs2 = ds2.withColumnRenamed("age", "age2")
Then:
// we join main dataset with the renamed ds2, now we have age and age2
ds1.join(renamedDs2, Seq("id", "name"), "left")
// we overwrite age, if age2 is not null from ds2, take it, otherwise leave age
.withColumn("age",
when(col("age2").isNotNull, col("age2")).otherwise(col("age"))
)
// finally, we drop age2
.drop("age2")
Hope this does what you want!