Home > Blockchain >  Java Algorithm Optimisation Question using a HashMap of Lists
Java Algorithm Optimisation Question using a HashMap of Lists

Time:01-10

If I had the following

postcodeToCompaniesList = new HashMap<String, List<Company>>();

After some processig the hashmap has many thousands of postcodes, and each postcode is mapped to a list with 1 or more Company objects.

Now I want to get the top 10 most popular postcodes and print out the number of companies at the postcode i.e. "Postcode SW1A 0AA has 50 companies".

The approach I've taken, and I think this is naive, is to create a Record object to hold the postcode and the corrisponding number of companies at that postcode. I then create a RecordList object for managing a list of 10 Record objects which have the highest count value (see below).

The RecordList.addRecord(Record record) method adds records until 10 are added and then conditionally adds more only if the count value is higher than the lowest value currently in the list. Then it sorts the list and removes the lowest value thereby maintaing a count of 10 elements.

final class Record implements Comparable<Record> {
    public String postcode;
    public int count;

    public Record(String postcode, int count){
        this.postcode = postcode;
        this.count = count;
    }

    @Override 
    public int compareTo(Record r){
        return Integer.valueOf(this.count).compareTo(Integer.valueOf(r.count));
    }
}

final class RecordList{
    List<Record> top10 = new ArrayList<Record>();
    
    public void addRecord(Record record){
        if (this.top10.size() < 10){
            this.top10.add(record);
            Collections.sort(this.top10);
        } else if (record.count > this.top10.get(0).count){
            this.top10.add(record);
            Collections.sort(this.top10);
            this.top10.remove(0);
        }
    }
}

Next I use a simple loop to loop through all the keys (postcodes) and add them one at a time to the RecordList along with the companyCount.

RecordList recordList = new RecordList();
Set<String> postcodes = this.postcodeToCompaniesList.keySet();

for(String postcode : postcodes){
  int companyCount = this.postcodeToCompaniesList.get(postcode).size();
  recordList.addRecord(new Record(postcode, companyCount));
}

System.out.println(gson.toJson(recordList));

This works. However, I get the feeling that this is an very inefficent way to do this? Maybe Streams would work better??

What are the thoughts from the community? Is there a better, more elegent way to do this?

CodePudding user response:

I'm going to assume List of Company is a List of Strings to make things simpler. I think this should do it:

HashMap<String, Integer> counter = new HashMap<>(); // postcodes counter
postcodeToCompaniesList.forEach((postcode, company) -> counter.put(postcode, company.size()));

counter.entrySet().stream()
    .sorted(Map.Entry.<String, Integer>comparingByValue().reversed())
    .limit(10)
    .forEach(System.out::println);

CodePudding user response:

You could stream over your entries sort them by the list size in reverse order, limit to n entries (in your case 10) and collect them in a LinkedHashMap:

private static Map<String, List<Company>> topN(Map<String, List<Company>> map, int n) {
    return map.entrySet()
              .stream()
              .sorted(Comparator.comparingInt((Map.Entry<String, List<Company>> e) -> e.getValue().size()).reversed())
              .limit(n)
              .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (x, y) -> y, LinkedHashMap::new));
}

and use it like below, for example to get the top 7 postcodes:

topN(postcodeToCompaniesList, 7)
        .forEach((k,v) -> System.out.printf("Postcode %s has %d companies.%n", k, v.size()));
  • Related