Home > database >  How can I join two lists in less than O(N*M)?
How can I join two lists in less than O(N*M)?

Time:12-25

Assume we have two tables (think as in SQL tables), where the primary key in one of them is the foreign key in the other. I'm supposed to write a simple algorithm that would imitate the joining of these two tables. I thought about iterating over each element in the primary key column in the first table, having a second loop where it checks if the foreign key matches, then store it in an external array or list. However, this would take O(N*M) and I need to find something better. There is a hint in the textbook that it involves hashing, however, I'm not sure how hashing could be implemented here or how it would make it better?

Editing to add an example:

Table Employees
|ID (Primary Key)| Name | Department|
|:---------------|:----:|----------:|
|       1        | Jack |    IT     |   
|       2        | Amy  |  Finance  |

Table Transactions
|Sold Product| Sold By(Foreign Key)| Sold To|
|:-----------|:-------------------:|-------:|
|    TV      |          1          |  Mary  |
|   Radio    |          1          |  Bob   |
|   Mobile   |          2          |  Lisa  |

What I want to do is write an algorithm that joins these two tables using hashing, and not anything complex, just a simple idea on how to do that.

CodePudding user response:

Populate a HashMap with the primary table elements, using their keys as the map key and the whole object as the value. This requires only 1 pass over the primary table and each operation to add to the HashMap occurs in constant time O(1). This is akin to creating a database index.

Iterate over the child table, “joining” to the parent row in constant time O(1) by getting the parent from the map.

Total runtime is 1 pass over each “table”, so O(n m).

Code might look something like:

class Parent {
    int id;
    // other fields, getters etc
}

class Child {
    int parentId;
    // other fields, getters etc
}

void join(List<Parent> parents, List<Child> children) {
    Map<Integer, Parent> parentMap = parents.stream()
      .collect(toMap(Parent::getKey, p -> p)); // FYI toMap collects to a HashMap
    for (Child child : children) {
        Parent parent = parentMap.get(child.getParentId());
        // not sure what “join” means, but you now have child and its parent
    }
}

CodePudding user response:

Read the child table's primary and foreign keys into a map where the keys are the foreign keys and the values are the primary keys. Keep in mind that one foreign key can map to multiple primary keys if this is a one to many relationship.

Now iterate over the primary keys of the mother table and for each primary key check whether it exists in the map. If so, you add a tuple of the primary keys of the rows that have a relation to the array (or however you want to save it).

The time complexity is O(n m). Iterate over the rows of each table once. Since the lookup in the map is constant, we don't need to add it.

Space complexity is O(m) where m is the number of rows in the child table. This is some additional space you use in comparison to the naive solution to improve the time complexity.

CodePudding user response:

Try this.

record Employees(int id, String name, String department) {}
record Transactions(String soldProduct, int soldBy, String soldTo) {}
record Joined(String soldProduct, int soldBy, String soldTo, String name, String department) {}

public static void main(String[] args) {
    List<Employees> employees = List.of(
        new Employees(1, "Jack", "IT"),
        new Employees(2, "Amy", "Finance"));
    List<Transactions> transactions = List.of(
        new Transactions("TV", 1, "Mary"),
        new Transactions("Radio", 1, "Bob"),
        new Transactions("Mobile", 2, "Lisa"));

    Map<Integer, Employees> mapEmployee = employees.stream()
        .collect(Collectors.toMap(Employees::id, Function.identity()));
    List<Joined> joined = transactions.stream()
        .map(t -> {
            Employees e = mapEmployee.get(t.soldBy());
            return new Joined(t.soldProduct(), t.soldBy(), t.soldTo(), e.name(), e.department());
        })
        .toList();

    for (Joined e : joined)
        System.out.println(e);
}

output:

Joined[soldProduct=TV, soldBy=1, soldTo=Mary, name=Jack, department=IT]
Joined[soldProduct=Radio, soldBy=1, soldTo=Bob, name=Jack, department=IT]
Joined[soldProduct=Mobile, soldBy=2, soldTo=Lisa, name=Amy, department=Finance]
  • Related