I have a task to read 2 files and match the contents of the files and provide a list of unmatched entries of both files. That means I have to present how many matched entries in the two files and how many unmatched entries in file 1 which is not in file 2 , how many unmatched entries in file 2 which is not in file 1.
My apporach is reading the files , creating java objects out of it, putting the contents of 2 files to 2 separate arraylists and compare them. My current code is listed below. For clarification, I want to check the content of the object ( eg : check EmployeeID and match from both files).
In below code, I have matched file1 content with file2, and removed the matched contents from file2.Works fine to match entries and get the unmatched count of file1 compared to file2.
I am plannning to match the remaining items in file2 and go another round in the same compareByEmpIdandDOB
method using fileTwoEmpList as first parameter and fileOneEmpList as second parameter get the unmatched count of file2 compared to file1. But I feel this is an overkill and not very efficient. Can someone point out a different approach if any pelase ?
Both of the arraylists are sorted. Thanks in advance !
public class EmpMatching {
public void compareLists(List<EmployeeDetails> fileOneEmpList, List<EmployeeDetails> fileTwoEmpList){
Collections.sort(fileOneEmpList);
Collections.sort(fileTwoEmpList);
List<EmployeeDetails> unmatchedFromListTwo = compareByEmpIdandDOB(fileOneEmpList,fileTwoEmpList);
}
public List<EmployeeDetails> compareByEmpIdandDOB(List<EmployeeDetails> fileOneEmpList,List<EmployeeDetails> fileTwoEmpList){
int matchEmpCountFromTwoFiles = 0;
System.out.println("File One List Size Before Recon " fileTwoEmpList.size());
for(EmployeeDetails fileOneEmp : fileOneEmpList){
for(int index = 0;index < fileTwoEmpList.size();index ){
EmployeeDetails fileTwoEmp= fileTwoEmpList.get(index);
if(fileOneEmp.getEmpID().equals(fileTwoEmp.getEmpID()) && fileOneEmp.getEmpDOB().equals(fileTwoEmp.getEmpDOB())){
matchEmpCountFromTwoFiles ;
fileTwoEmpList.remove(fileTwoEmp);
System.out.println("Match Found " fileOneEmp.getEmpID());
}
}
System.out.println("File Two List Size " fileTwoEmpList.size());
}
System.out.println("Match Count >>>>> " matchEmpCountFromTwoFiles);
System.out.println("File Two List Size >>>>> " fileTwoEmpList.size());
return fileTwoEmpList;
}
}
//Model class
public class EmployeeDetails implements Comparable<EmployeeDetails>{
private String EmpID;
private String EmpName;
private String EmpDOB;
@Override
public int compareTo(EmployeeDetails o) {
return 0;
}
}
CodePudding user response:
You don't need to sort these lists for this task.
In terms of the Set theory, you need to find the set difference. I.e. to find all unique objects that appear only in the first or in the second list.
This task can be solved in a few lines of code with liner time complexity. But it is important to implement the equals/hashCode
contract in the EmployeeDetails
.
public List<EmployeeDetails> compareLists(List<EmployeeDetails> fileOneEmpList,
List<EmployeeDetails> fileTwoEmpList) {
Set<EmployeeDetails> emp1 = new HashSet<>(fileOneEmpList);
Set<EmployeeDetails> emp2 = new HashSet<>(fileTwoEmpList);
emp1.removeAll(emp2);
emp2.removeAll(emp1);
emp1.addAll(emp2);
return new ArrayList<>(emp1);
}
The approach above is both the most efficient and the simplest.
If you are comfortable with Streams API, you can try another approach and implement this method in the following way:
public List<EmployeeDetails> compareLists(List<EmployeeDetails> fileOneEmpList,
List<EmployeeDetails> fileTwoEmpList) {
return Stream.of(new HashSet<>(fileOneEmpList), new HashSet<>(fileTwoEmpList)) // wrapping with sets to ensure uniqueness (if objects in the list are guaranteed to be unique - use lists instead)
.flatMap(Collection::stream)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.filter(entry -> entry.getValue() == 1) // i.e. object appear only once either in the first or in the second list
.map(Map.Entry::getKey)
.collect(Collectors.toList()); // .toList(); for Java 16
}
Time complexity of the stream based solution would be linear as well. But as I've said, the first solution based on the Collections API is simpler and slightly more performant.
If for some reason, there's no proper implementation of equals()
and hashCode()
in the EmployeeDetails
. And you have no control over this class and can't change it. Then you can declare a wrapper class and perform the same actions.
Below is an example of how to create the wrapper using Java 16 records.
Methods equals()
and hashCode()
will be generated by the compiler based on empId
and empDob
.
public record EmployeeWrapper(String empId, String empDob) {
public EmployeeWrapper(EmployeeDetails details) {
this(details.getEmpID(), details.empDOB);
}
}
The implementation of the equals/hashCode
for the EmployeeDetails
class based on the empID
and empDOB
might look like this (also, you can use the facilities of your IDE to generate these methods):
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
EmployeeDetails that = (EmployeeDetails) o;
return empID.equals(that.empID) && empDOB.equals(that.empDOB);
}
@Override
public int hashCode() {
return Objects.hash(empID, empDOB);
}