I'm currently studying 'C Primer Plus' book by Prata. On chapter 16 exercise question I've encountered following exercise:
Compared to an array, a linked list features easier addition and removal of elements but is slower to sort.This raises a possibility: Perhaps it might be faster to copy a list to an array, sort the array, and copy the sorted result back to the list than to simply use the list algorithm for sorting. (But it also could use more memory.) Test the speed hypothesis with the following approach: a. Create a large vector object vi0, using rand() to provide initial values. b. Create a second vector object vi and a list object li of the same size as the original and initialize them to values in the original vector. c. Time how long the program takes to sort vi using the STL sort() algorithm, then time how long it takes to sort li using the list sort() method. d. Reset li to the unsorted contents of vi0.Time the combined operation of copying li to vi, sorting vi, and copying the result back to li. To time these operations, you can use clock() from the ctime library. As in Listing 5.14, you can use this statement to start the first timing: clock_t start = clock(); Then use the following at the end of the operation to get the elapsed time: clock_t end = clock(); cout << (double)(end - start)/CLOCKS_PER_SEC; This is by no means a definitive test because the results will depend on a variety of factors, including available memory, whether multiprocessing is going on, and the size of the array or list. (One would expect the relative efficiency advantage of the array over the list to increase with the number of elements being sorted.) Also if you have a choice between a default build and a release build, use the release build for the measurement.With today’s speedy computers, you probably will need to use as large an array as possible to get meaningful readings.You might try, for example, 100,000 elements, 1,000,000 elements, and 10,000,000 elements.
My implementation is like:
#include <iostream>
#include <vector>
#include <list>
#include <ctime>
#include <cstdlib>
#include <algorithm>
using namespace std;
const int MAX = 10'000'000;
int main()
{
// Make real random
srand(time(0));
vector<int> vi0(MAX);
for( int i=0; i<MAX; i )
{
int r = rand();
vi0[i] = r;
}
vector<int> vi(MAX);
list<int> li;
for( int i=0; i<MAX; i )
{
int r = vi0[i];
vi[i] = r;
li.push_back(r);
}
clock_t start = clock();
sort( vi.begin(), vi.end() );
clock_t end = clock();
cout << "Time to sort vector 'vi': '" << (double)(end-start)/CLOCKS_PER_SEC << "'\n";
start = clock();
li.sort();
end = clock();
cout << "Time to sort list 'li': '" << (double)(end-start)/CLOCKS_PER_SEC << "'\n";
// Reset values
li.clear();
for( int i=0; i<MAX; i )
{
li.push_back(vi0[i]);
}
// Get time to copy values to 'vi', sort them & copy back to 'li'
start = clock();
auto x = vi.begin();
auto i = li.begin();
while( i != li.end() )
{
*x = *i;
x;
i;
}
sort( vi.begin(), vi.end() );
x = vi.begin();
i = li.begin();
while( x != vi.end() )
{
*i = *x;
i;
x;
}
end = clock();
cout << "Time to copy 'li' to 'vi'. Sort 'vi' and copy values to 'li': '" <<
(double)(end-start)/CLOCKS_PER_SEC << "'\n";
return 0;
}
Compiled code as: g -O3 CompareListAndArraySorting.cc
Output:
Time to sort vector 'vi': '1.02916' Time to sort list 'li': '7.87467' Time to copy 'li' to 'vi'. Sort 'vi' and copy values to 'li': '3.73348'
I'm wondering, for me it looks puzzling, that list sort is slower, than copy values to vector, vector sort and copy values back. I'm wondering would you know:
- Is this due to specific machine processing?
- Is the main reason for list.sort() is to save extra space of copying values to vector? If software has no issues with extra space allocation would copy vals to vector, sort & copy back be better solution over slow list.sort()?
Am I missing some points of the exercise?
CodePudding user response:
Is this due to specific machine processing?
The general answer to this question is nearly always yes. For example, if a processor provide an SIMD instruction to help the the sort, then it can be much faster to sort a vector than a linked-list that is not contiguously stored in memory. The speed of the memory hierarchy also plays a huge role, not to mention the out-of-order execution and instruction-level parallelism. That being said, mainstream processors are very similar nowadays so performance behaviours generally does not drastically change from one processor to another (at least as long as they are mainstream x86/ARM desktop/server processors).
Linked lists are slow because this is a non-contiguous data-structure doing many unpredictable memory accesses. As a result, this cause a lot of pipeline stall during the execution slowing down the program. The speed of sorting a vector will be certainly even better in a near future because recent SIMD instruction sets can speed up some algorithms like Quicksort or variants of the Mergesort (eg. BitonicSort) that are quite frequently used for efficiently sorting data (the one of the STL should be an Introsort heavily based on Quicksort). To be more precise, AVX-512 (x86-64) and SVE (ARM) will be able to speed up the partitioning step assuming the compiler can auto-vectorize this part of the algorithm or the implementation use a SIMD-friendly code (generally not yet the case). AFAIK, the Intel TBB library is able to vectorize such sorts. Linked-list are not going to be significantly faster because hardware vendors can hardly optimize the chain of indirect memory loads (it cannot be faster than 1 cycle per access anyway).
Is the main reason for list.sort() is to save extra space of copying values to vector? If software has no issues with extra space allocation would copy vals to vector, sort & copy back be better solution over slow list.sort()?
Not only. The speed of a copy is dependent of what you copy. While copying an int is very cheap, copying/moving big complex objects can be far more expensive. Such a big object can still be fast to compare. Furthermore, a basic type like an int can reference an handle or a kind of pointer to a big complex object that is slow to compare (eg. using a lambda). Thus, it is all about finding a threshold between the cost of a copy VS the cost of the comparison operator.
In your case, using a vector is far better than a linked list. In fact, linked list are very rarely used in high-performance algorithm (there are almost always better a data structure).