Binary search taking more time than linear search-CodePudding

I was recently studying about binary and linear search , and decided to check for real what is the difference in the actual time taken by both of these searching algorithms. Here is my code-:

#include <bits/stdc  .h>
using namespace std;

void Print(int arr[] , int size){
    for(int i = 0 ; i < size ; i  )
        cout<<arr[i]<<" ";
    cout<<endl;
}

void Merge_sort(int arr[] , int start , int end){
    
    if(start >= end){
        return ;
    }
    
    int mid = (start   end)/2;
    Merge_sort(arr , start , mid);
    Merge_sort(arr , mid   1 , end);
    
    int size = end - start   1;
    
    int arr_new[size];
    
    int i = start , j = mid   1;
    
    for(int k = 0 ; k < size ; k  ){
        if(i > mid){
            arr_new[k] = arr[j  ];
        }else if(j > end){
            arr_new[k] = arr[i  ];
        }else if(arr[i] < arr[j]){
            arr_new[k] = arr[i  ];
        }else{
            arr_new[k] = arr[j  ];
        }
    }
    int index = start ;
    for(int k = 0 ; k < size ; k  ){
        arr[index  ] = arr_new[k];
    }
}

int binary_search(int arr[] , int start , int end , int x){
    if(start > end){
        return -1;
    }
    
    int mid = (start   end)/2;
    if(arr[mid] == x){
        return mid;
    }else if(arr[mid] > x){
        return binary_search(arr , start , mid - 1 , x);
    }else{
        return binary_search(arr , mid   1 , end , x);
    }
}



int linear_search(int arr[] , int size , int x){
    for(int i = 0 ; i < size ; i  ){
        if(arr[i] == x){
            return i;
        }
    }
    return -1;
}

int main()
{
    int size;
    
    clock_t start , start1 , end , end1;
    double time1 , time2;
    
    
    cin>>size;
    
    int *arr1 = new int[size];
    int *arr2 = new int[size];
    
    for(int i = 0 ; i < size ; i  ){
        int j = rand() % 1000;
        arr1[i] = i;
        arr2[i] = i;
    }
    
    Merge_sort(arr1 , 0 , size-1);
    start = clock();
    cout<<binarySearch(arr1 , 0 , size - 1 , 15)<<endl;
    end = clock();
    time1 = (double)(end - start)/(double)(CLOCKS_PER_SEC);
    cout << "Time taken by binary search  is : \t" << fixed
        << time1 << setprecision(8);
    cout << " sec " << endl;
    
    start1 = clock();
    cout<<linear_search(arr2 , size , 15)<<endl;
    end1 = clock();
    time2 = (double)(end1 - start1)/(double)(CLOCKS_PER_SEC);
    cout << "Time taken by linear search  is : \t" << fixed
        << time2 << setprecision(8);
    cout << " sec " << endl;
    return 0;
}

I have used Merge sort to sort the array for the binary search part , but have excluded it while calculating the time taken. Kindly take a look and tell me where am I going wrong.

CodePudding user response：

I was just watching an interesting CPP Con video on YouTube about sorting, and part of it addressed linear versus binary searching.

Modern CPUs do some very interesting predictive work. They may guess ahead and can be a LOT more efficient if you follow the path they think your code is going to take. Binary searches make it impossible for that to work, so they can actually be slower on modern hardware where older hardware didn't do this, and thus binary searches were typically much faster as soon as the sample size grew to something over maybe as few as 20 values.

https://www.youtube.com/watch?v=FJJTYQYB1JQ&t=2272s

So to some extent, your sample size will matter, but this talk may also shed some light.

CodePudding user response：

If the input size is large enough, binary search is indeed faster than linear search. When computer scientists talk about one algorithm being faster than another one, they mean that the time complexity of one algorithm is better than the other one.

Binary search has worst time complexity O(lgn), where lgn means 2-based logarithm of n.

Linear search has worst time complexity O(n), where n.

In both of these cases, n is the size of the input, means the size of the set of numbers you are searching.

Formally, O(g(n)) is defined as, The set of functions f(n) such that there exists a positive integer n_0 such that for n >= n_0, there exists a constant 0 <= f(n) <= c1*g(n)

All the mathematics and formal definitions aside, in summary, binary search is better than linear search in long term. By long term I mean if your input size is big enough. I am assuming you ran the program using less numbers, like 10 or maybe a 100. But when you have a large amount of data, say, 10^5 numbers, then binary search is always going to be faster.

Also, an algorithm is better than other usually means in worst case. That is, if the number you are searching for is in the beginning of array, then on the first try you will get that number. But for binary search, it will take a few number of steps.

For linear search, the worst case is if the number that you are searching for is at the end of the list.

So in worst cases and average cases, binary search performs better.

In your program, you have used recursion for binary searching and a for loop for linear search. Recursion has more overhead than a loop. And the compiler does some smart preprocessing when you are using a loop as @JosephLarson mentioned. But if you run your code on large enough inputs, I am assuming that you will find that binary search is faster.

Also, you can compare binary search to finding a specific page number in a book. Binary search will always be faster than going one page at a time.