Home > Enterprise >  Cycling through interval in C efficiently
Cycling through interval in C efficiently

Time:11-29

I have dynamically allocated array consisting of a lot of numbers (200 000 ) and I have to find out, if (and how many) these numbers are contained in given interval. There can be duplicates and all the numbers are in random order.

Example of numbers I get at the beginning:

{1,2,3,1484984,48941651,489416,1816,168189161,6484,8169181,9681916,121,231,684979,795641,231484891,...}

Given interval:

<2;150000>

I created a simple algorithm with 2 for loops cycling through all numbers:

for( int j = 0; j <= numberOfRepeats; j  ){
        for( int i = 0; i < arraySize; i  ){
            if(currentNumber == array[i]){
            counter  ;
           }
        }
        currentNumber  ;
    }
    printf(" -> %d\n", counter);       
}

This algorithm is too slow for my task. Is there more efficient way for me to implement my solution? Could sorting the arrays by value help in this case / wouldn't that be too slow?

Example of working program:

{ 1, 7, 22, 4, 7, 5, 11, 9, 1 }

<4;7>

 -> 4

CodePudding user response:

The problem was simple as the single comment in my question answered it - there was no reason for second loop. Single loop could do it alone.

My changed code:

    for(int i = 0; i <= arraySize-1; i  ){
        if(array[i] <= endOfInterval && array[i] >= startOfInterval){
            counter  ;
        }

CodePudding user response:

This algorithm is too slow for my task. Is there more efficient way for me to implement my solution? Could sorting the arrays by value help in this case / wouldn't that be too slow?

Of course, it is slow. A single pass algorithm to count the number of elements that are in the set should suffice, just count them in a single pass if they pass the test (be n[i] >= lower bound && be n[i] < upper bound or similar approach) will do the work.

Only in case you need to consider duplicates (e.g. not counting them) you will need to consider if you have already touched them or no. In that case, the sorting solution will be faster (a qsort(3) call is O(nlog(n)) against the O(nn) your double loop is doing, so it will run in an almost linear, then you make a second pass over the data (converting your complexity to O(nlog(n) n), still lower than O(nn) for the large amount of data you have.

Sorting has the advantage that puts all the repeated key values together, so you have to consider only if the last element you read was the same as the one you are processing now, if it is different, then count it only if it is in the specified range.

One final note: Reading a set of 200,000 integers into an array to filter them, based on some criteria is normally a bad, non-scalable way to solve a problem. Your problem (select the elements that belong to a given interval) allow you for a scalable and better solution by streaming the problem (you read a number, check if it is in the interval, then output it, or count it, or whatever you like to do on it), without using a large amount of memory to hold them all before starting. That is far better way to solve a problem, as it allows you to read a true unbounded set of numbers (coming e.g. from a file) and producing an output based on that:

#include <stdio.h>
#define A (2)
#define B (150000)
int main()
{
    int the_number;
    size_t count = 0;
    int res;
    while ((res = scanf("%d", &the_number)) > 0) {
        if (the_number >= A && the_number <= B)
            count  ;
    }
    printf("%zd numbers fitted in the range\n", count);
}

on this example you can give the program 1.0E26 numbers (assuming that you have an input file system large enough to hold a file this size) and your program will be able to handle it (you cannot create an array with capacity to hold 10^26 values)

  • Related