I have a snowflake database and it doesn't support unique constraint enforcement (https://docs.snowflake.com/en/sql-reference/constraints-overview.html).
I'm planning to have a method on JPA repository with a custom SQL query to check for duplicates before inserting to the table.
Entity
@Entity
@Table(name = "STUDENTS")
public class Students {
@Id
@Column(name = "ID", columnDefinition = "serial")
@GenericGenerator(name = "id_generator", strategy = "increment")
@GeneratedValue(generator = "id_generator")
private Long id;
@Column(name = "NAME")
private String studentName;
}
Snowflake create table query
CREATE table STUDENTS(
id int identity(1,1) primary key,
name VARCHAR NOT NULL,
UNIQUE(name)
);
Repository
public interface StudentRepository extends JpaRepository<Students, Long> {
//
@Query(value = "???", nativeQuery = true)
List<Student> bulkUpsertStudents(List<Student> students);
}
CodePudding user response:
You can use a SELECT
query to check for duplicate values in the name
column before inserting a new record into the table. For example:
@Query(value = "SELECT * FROM STUDENTS WHERE name = :name", nativeQuery = true)
List<Student> findByName(@Param("name") String name);
This method will return a list of Student
records with the specified name
value. If the list is empty, it means that there are no records with that name
value, and you can safely insert a new record with that name
value.
List<Student> studentList = new ArrayList<>();
for (Student student : students) {
List<Student> existingStudents = studentRepository.findByName(student.getName());
if (existingStudents.isEmpty()) {
studentsToInsert.add(student);
}
}
studentRepository.bulkUpsertStudents(studentList)
EDIT
If the above solution doesn't work. You can use the MERGE
statement to update existing records in the table if the data has changed. For example, if you want to update the name of a Student
if it has changed, you can use the following MERGE
statement:
@Query(value = "MERGE INTO students t USING (SELECT :name AS name, :newName AS newName) s
ON t.name = s.name
WHEN MATCHED AND t.name <> s.newName THEN UPDATE SET t.name = s.newName
WHEN NOT MATCHED THEN INSERT (name) VALUES (s.name)", nativeQuery = true)
List<Student> bulkUpsertStudents(List<Student> students);
This query will update the name of each Student
in the students
list if it has changed, and if a conflict occurs, it will not insert a new record. This will ensure that only unique name values are inserted into the table, without having to perform a separate query for each record.
CodePudding user response:
I was able to overcome this using the below approach but need to verify the performance of the queries.
Repository saveAll() method to save all the entities.
Using the custom nativeQuery as below
INSERT OVERWRITE INTO STUDENTS WITH CTE AS(SELECT ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY ID) AS RNO, ID, NAME FROM STUDENTS) SELECT ID, NAME FROM CTE WHERE RNO = 1;
Example code :
import static io.vavr.collection.List.ofAll;
import static io.vavr.control.Option.of;
import static java.util.function.Predicate.not;
public Validation<ValidationError, List<Students>> saveAll(List<String> students) {
return of(students)
.filter(not(List::isEmpty))
.map(this::mapToEntities) // maps the list to list of database entities
.map(repository::saveAll) // save all
.toValidation(ERROR_SAVING_STUDENTS) // vavr validation in case of error
.peek(x -> repository.purgeStudents()) // purging to remove duplicates
.toValidation(ERROR_PURGING_STUDENTS);
}
This issue is only due to snowflake's incapability to check uniqueness.