Home > Software design >  Set operations between lists of different object types
Set operations between lists of different object types

Time:03-03

I have two lists, a List<User> localUsers and a List<RemoteUser> remoteUsers. One list are local users on a server, and the other list contains users from another server obtained through gRPC. They might be new, exist within our system and need updating, or are inactive (they exist within the first list but not the second).

In order to determine which users are which, I'd like to perform set operations between these lists, but the lists are between objects of different types. To get around this, I found an implementation of set operations using streams, but they are somewhat cumbersome and difficult to read:

// This operation returns a difference between remote users and local users
// signifies new users that don't yet exist on the server
List<RemoteUser> newUsers = remoteUsers.stream()
.filter(remoteUser ->
localUsers.stream().noneMatch(
localUser ->
localUser.getUserName().equals(checkAddPrefix(remoteUser.getLogin())))).collect(Collectors.toList());

I realize that I can do away with the ambiguity of the stream by just doing some kind of double for loop, but anyway...

Finding the inactive users is the same as above with the lists switched around.

Common users between both groups can be performed with

remoteUsers.removeAll(newRemoteUsers)

Is there a cleaner way to perform set operations between Lists of differing object types? Time complexity of the difference operation is O(m * n) for m = len(localusers), n = len(remoteUsers).

Can I get better than this?

CodePudding user response:

As Jim Garrison suggested you can create a class that will define the equals() and hashCode() based on the user credentials like login and at the same time will store references to the instances User and RemoteUser. So that it will be possible to compare these wrapper objects and then extract the original instances of User or RemoteUser.

The client code could look like this:

public static void main(String[] args) {
    List<UserData> remoteData = remoteUsers.stream()
            .map(remUser -> new UserData(remUser.getLogin(), remUser, null))
            .collect(Collectors.toList());

    Set<UserData> localData = localUsers.stream()
            .map(locUser -> new UserData(locUser.getUserName(), null, locUser))
            .collect(Collectors.toSet());

    remoteData.removeAll(localData); // will perform in O(n) time

    List<RemoteUser> newRemoteUsers = remoteData.stream()
            .map(UserData::getRemoteUser)
            .collect(Collectors.toList());
}

Note that in this case the time complexity of the removeAll() method will be O(n), not O(n * m). If you look at the source code you'll find that this operation is performed in a single pass through the list and for every element in the list method contains() will be invoked on the collection that is passed as an argument. Since in this case, it's a Set the cost of contains() is O(1) the overall time complexity is O(n).

And the wrapper class might be the following:

public class UserData{
    private final String login;
    private final RemoteUser remoteUser;
    private final User localUser;

    public UserData(String login, RemoteUser remoteUser, User localUser) {
        this.login = login;
        this.remoteUser = remoteUser;
        this.localUser = localUser;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj instanceof UserData other) {
            return this.login.equals(other.login);
        } else {
            return false;
        }
    }

    @Override
    public int hashCode() {
        return Objects.hash(login);
    }

    // getters
}
  • Related