Is there a way to have custom SQL query on top of JPA repository to have BULK UPSERTS?-CodePudding

I have a snowflake database and it doesn't support unique constraint enforcement (https://docs.snowflake.com/en/sql-reference/constraints-overview.html).

I'm planning to have a method on JPA repository with a custom SQL query to check for duplicates before inserting to the table.

Entity

@Entity
@Table(name = "STUDENTS")
public class Students {

  @Id
  @Column(name = "ID", columnDefinition = "serial")
  @GenericGenerator(name = "id_generator", strategy = "increment")
  @GeneratedValue(generator = "id_generator")
  private Long id;

  @Column(name = "NAME")
  private String studentName;

}

Snowflake create table query

CREATE table STUDENTS(
    id int identity(1,1) primary key,
    name VARCHAR NOT NULL,
    UNIQUE(name)
);

Repository

public interface StudentRepository extends JpaRepository<Students, Long> {

//
@Query(value = "???", nativeQuery = true)
    List<Student> bulkUpsertStudents(List<Student> students);

}

CodePudding user response：

You can use a SELECT query to check for duplicate values in the name column before inserting a new record into the table. For example:

@Query(value = "SELECT * FROM STUDENTS WHERE name = :name", nativeQuery = true)
List<Student> findByName(@Param("name") String name);

This method will return a list of Student records with the specified name value. If the list is empty, it means that there are no records with that name value, and you can safely insert a new record with that name value.

List<Student> studentList = new ArrayList<>();
for (Student student : students) {
    List<Student> existingStudents = studentRepository.findByName(student.getName());
    if (existingStudents.isEmpty()) {
        studentsToInsert.add(student);
    }
}
studentRepository.bulkUpsertStudents(studentList)

EDIT

If the above solution doesn't work. You can use the MERGE statement to update existing records in the table if the data has changed. For example, if you want to update the name of a Student if it has changed, you can use the following MERGE statement:

@Query(value = "MERGE INTO students t USING (SELECT :name AS name, :newName AS newName) s
ON t.name = s.name 
WHEN MATCHED AND t.name <> s.newName THEN UPDATE SET t.name = s.newName
WHEN NOT MATCHED THEN INSERT (name) VALUES (s.name)", nativeQuery = true)
List<Student> bulkUpsertStudents(List<Student> students);

This query will update the name of each Student in the students list if it has changed, and if a conflict occurs, it will not insert a new record. This will ensure that only unique name values are inserted into the table, without having to perform a separate query for each record.

CodePudding user response：

I was able to overcome this using the below approach but need to verify the performance of the queries.

Repository saveAll() method to save all the entities.

Using the custom nativeQuery as below

   INSERT OVERWRITE INTO STUDENTS
   WITH CTE AS(SELECT ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY ID) AS RNO, ID, NAME FROM STUDENTS)
   SELECT ID, NAME FROM CTE WHERE RNO = 1;

Example code :

import static io.vavr.collection.List.ofAll;
import static io.vavr.control.Option.of;
import static java.util.function.Predicate.not;

 public Validation<ValidationError, List<Students>> saveAll(List<String> students) {
    return of(students)
        .filter(not(List::isEmpty))
        .map(this::mapToEntities) // maps the list to list of database entities
        .map(repository::saveAll) // save all
        .toValidation(ERROR_SAVING_STUDENTS) // vavr validation in case of error
        .peek(x -> repository.purgeStudents()) // purging to remove duplicates
        .toValidation(ERROR_PURGING_STUDENTS);
  }

This issue is only due to snowflake's incapability to check uniqueness.