Home > Software engineering >  Efficient counting of occurrences in a range
Efficient counting of occurrences in a range

Time:12-19

You want to collect information while student grades are being processed. School students have default list numbers "1, 2, 3, 4..." and the program must process the following instructions:

  1. REGISTER (c): Register that the next student on the list obtained a grade c.
  2. COUNT (c, i, j): Count how many students with a grade of c there are between [i, j] (inclusive).

INPUT:
An integer N followed by the N instructions to process. You can assume that 0 <= N <= 100000, grades are between 0 and 100, and all list number ranges will refer to already registered students.

OUTPUT:
The corresponding value for each COUNT instruction.

EXAMPLE:

  • Input:

    7
    REGISTER 8
    REGISTER 7
    REGISTER 8
    COUNT 8 1 2
    COUNT 8 1 3
    REGISTER 7
    COUNT 7 1 2
    
  • Output:

    1
    2
    1
    

As you can imagine, this is an internet problem (this one) and I came up with this solution:

#include <algorithm>
#include <iostream>
#include <vector>

int main() {
    std::ios_base::sync_with_stdio(false);
    std::cin.tie(NULL);

    std::vector<short> A;
    int N;

    std::cin >> N;
    while (N--) {
        std::string W;

        std::cin >> W;
        if (W == "REGISTER") {
            short C;

            std::cin >> C;
            A.push_back(C);
        } else {
            int I, J;
            short C;

            std::cin >> C >> I >> J;
            std::cout << std::count(A.begin()   I - 1, A.begin()   J, C)
                      << "\n";
        }
    }

    return 0;
}

Apparently my code is very slow. Could someone help me find a more efficient solution? I've been thinking for quite some time and I just can't find a way.

CodePudding user response:

Decided to implement 3 solutions for you. First SolveSimple() is same as yours, I made it for reference to test correctness of two others. Second SolveBinSearch() is based on Binary Search approach. Third SolveDynProg() is based on Dynamic Programming approach.

Simple solution (same like yours) is about 10x times slower than other two. Binary search is about 2x faster than dynamic programming variant. Binary search variant is also considerably more memory-economic than dyn prog, but for the case of 100 K instructions this memory economy is not a big issue.

Tests in my code are randomly generated. You can tweak number of test instructions num_instructions and max_grade, also flag verbose controls if to output extra debug information like answers of each method. After code you'll find timings.

Each of three methods accepts input and output stream, you can just pass in standard input and output, like SolveBinSearch(std::cin, std::cout);, in the case of sending solutiont to online contest judging system.

If to choose then SolveBinSearch() is the fastest and memory economic solution.

Try it online!

#include <random>
#include <sstream>
#include <string>
#include <vector>
#include <cstdint>
#include <algorithm>
#include <iostream>
#include <iomanip>
#include <array>
#include <chrono>

using std::size_t;
using u8 = uint8_t;
using u32 = uint32_t;

void SolveSimple(std::istream & inp, std::ostream & out) {
    std::vector<u8> grades;
    while (inp) {
        std::string cmd;
        inp >> cmd;
        if (cmd == "REGISTER") {
            size_t grade = 0;
            inp >> grade;
            grades.push_back(grade);
        } else if (cmd == "COUNT") {
            size_t grade = 0, i = 0, j = 0;
            inp >> grade >> i >> j;
            out << std::count(grades.begin()   (i - 1),
                grades.begin()   j, grade) << std::endl;
        } else
            break;
    }
}

void SolveBinSearch(std::istream & inp, std::ostream & out) {
    // https://en.wikipedia.org/wiki/Binary_search_algorithm
    std::vector<std::vector<u32>> cnts(101);
    size_t cnt = 0;
    while (inp) {
        std::string cmd;
        inp >> cmd;
        if (cmd == "REGISTER") {
            size_t grade = 0;
            inp >> grade;
            cnts[grade].push_back(cnt);
              cnt;
        } else if (cmd == "COUNT") {
            size_t grade = 0, i = 0, j = 0;
            inp >> grade >> i >> j;
            auto begin = std::lower_bound(
                cnts[grade].begin(), cnts[grade].end(), i - 1);
            auto end = std::upper_bound(
                cnts[grade].begin(), cnts[grade].end(), j - 1);
            out << (end - begin) << std::endl;
        } else
            break;
    }
}

void SolveDynProg(std::istream & inp, std::ostream & out) {
    // https://en.wikipedia.org/wiki/Dynamic_programming
    std::vector<std::array<u32, 101>> cnts(1);
    while (inp) {
        std::string cmd;
        inp >> cmd;
        if (cmd == "REGISTER") {
            size_t grade = 0;
            inp >> grade;
            cnts.push_back(cnts.back());
              cnts.back()[grade];
        } else if (cmd == "COUNT") {
            size_t grade = 0, i = 0, j = 0;
            inp >> grade >> i >> j;
            out << cnts[j][grade] - cnts[i - 1][grade] << std::endl;
        } else
            break;
    }
}

void Test() {
    bool constexpr verbose = 0;
    size_t constexpr num_instructions = 100000, max_grade = 100;

    auto TimeCur = []{
        return std::chrono::high_resolution_clock::now();
    };
    auto const gtb = TimeCur();
    auto Time = [&]{
        return std::chrono::duration_cast<std::chrono::microseconds>(
            TimeCur() - gtb).count() / 1000000.0;
    };

    std::mt19937_64 rng{123};
    std::stringstream inp;
    size_t cnt = 0;
    for (size_t i = 0; i < num_instructions;   i) {
        if ((rng() & 1) || (cnt == 0)) {
              cnt;
            inp << "REGISTER " << rng() % (max_grade   1) << std::endl;
        } else {
            size_t a = rng() % cnt   1, b = rng() % cnt   1;
            if (a > b)
                std::swap(a, b);
            inp << "COUNT " << rng() % (max_grade   1) << " "
                << a << " " << b << std::endl;
        }
    }
    auto inp_str = inp.str();
    if (verbose) {
        std::cout << "Input: " << std::endl
            << inp.str() << std::endl;
    }

    std::cout << std::fixed << std::boolalpha;
    double tb = 0;
    std::stringstream out_simple, out_binsearch, out_dynprog;

    tb = Time(); inp.clear(); inp.str(inp_str);
    SolveSimple(inp, out_simple);
    std::cout << "Simple time " << (Time() - tb) << std::endl;

    tb = Time(); inp.clear(); inp.str(inp_str);
    SolveBinSearch(inp, out_binsearch);
    std::cout << "BinSearch time " << (Time() - tb) << std::endl;

    tb = Time(); inp.clear(); inp.str(inp_str);
    SolveDynProg(inp, out_dynprog);
    std::cout << "DynProg time " << (Time() - tb) << std::endl;

    if (verbose) {
        std::cout << "Simple: " << std::endl
            << out_simple.str() << std::endl;
        std::cout << "BinSearch: " << std::endl
            << out_binsearch.str() << std::endl;
        std::cout << "DynProg: " << std::endl
            << out_dynprog.str() << std::endl;
    }
    std::cout << "Simple == BinSearch: "
        << (out_simple.str() == out_binsearch.str()) << std::endl;
    std::cout << "Simple == DynProg: "
        << (out_simple.str() == out_dynprog.str()) << std::endl;
    std::cout << "BinSearch == DynProg: "
        << (out_binsearch.str() == out_dynprog.str()) << std::endl;
}

int main() {
    Test();
}

Output:

Simple time 0.184678
BinSearch time 0.020600
DynProg time 0.048994
Simple == BinSearch: true
Simple == DynProg: true
BinSearch == DynProg: true
  • Related