Home > Back-end >  Filter one column by not matching to another column
Filter one column by not matching to another column

Time:10-27

How can I create a dataframe containing the records of the table1 table that DO NOT match the istituto, service_rap, filiale_rap, codice_rap fields with the table table2

I tried something like this (but this don't work):

val result: Dataset[Row] = table1.where($"istituto".notEqual(table2("istituto")))
val result: Dataset[Row] = table1.where($"istituto" =!= (table2("istituto")))

Error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved attribute(s) istituto#16 missing from istituto#42,servizio_rap#43,filiale_rap#44,codice_rap#45,ndg#46 in operator !Filter NOT (istituto#42 = istituto#16). Attribute(s) with the same name appear in the operation: istituto. Please check if the right attribute(s) are used.;
!Filter NOT (istituto#42 = istituto#16)

table1:

  private val table1: DataFrame = Seq(
    ("03104", "001", "00002", "123456", "ndg1"),
    ("03104", "001", "00002", "123455", "ndg2")
  ).toDF("istituto", "servizio_rap", "filiale_rap", "codice_rap", "ndg")

table2:

  private val secondInput: DataFrame = Seq(
    ("03106", "001", "00002", "123456", "ndg1"))
    .toDF("istituto", "servizio_rap", "filiale_rap", "codice_rap", "ndg")

Expected result:

 -------- ------------ ----------- ---------- ---- 
|istituto|servizio_rap|filiale_rap|codice_rap|ndg |
 -------- ------------ ----------- ---------- ---- 
|03106   |002         |00003      |123465    |ndg1|
 -------- ------------ ----------- ---------- ---- 

CodePudding user response:

From the comment made by @Miko :

Resolved using leftanti join

val result: DataFrame = secondInput.join(input,Seq("servizio_rap","filiale_rap","codice_rap","istituto"),"leftanti") 
  • Related