Home > front end >  sorting a mmaped file with random integers in C
sorting a mmaped file with random integers in C

Time:06-21

I'm trying to create a C program that creates a txt file with random integers, and then mmap the said file and qsort it. Creating the txt and mapping goes smoothly, but I can't figure out why qsort just destroys it. My guess, it's something to do with data types, but even after playing around with them, I get more or less the same result.

compar:

int cmp(const void *p1, const void *p2)
{
    return (*(const int *)p1 - *(const int *)p2);
}

mmap and qsort:

    char *addr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, myFile, 0); 
    for (int i = 0; i < size; i  )
         printf("%c", addr[i]);
    qsort(addr, 20, sizeof(char), cmp);
    printf("\n-------------\n");
    for (int i = 0; i < size; i  )
         printf("%c", addr[i]);

output example:

10
19
9
8
18
2
9
6
3
7
15
12
12
14
6
2
4
3
15
13

-------------

296

1
9
93818
1
0

7
15
12
12
14
6
2
4
3
15
13

So 20 random integers are created in txt and mapped with no problems. But I guess qsort doesn't like what's given to it. I tried having mmap as int, but that created other problems like incorrect numbers in the array and incorrect amount of numbers and some 0s (perhaps I handled the output incorrectly). I'm not exactly sure what I'm doing wrong. here is the full code in case if it's needed:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <time.h>
#include <stdlib.h>

int cmp(const void *p1, const void *p2);
void rand_txt(int size);

int main() {

    rand_txt(20);

    int myFile = open("rand.txt", O_RDWR);
    struct stat myStat = {};
    fstat(myFile, &myStat);
    off_t size = myStat.st_size;

    char *addr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, myFile, 0);

    for (int i = 0; i < size; i  )
        printf("%c", addr[i]);

    qsort(addr, 20, sizeof(char), cmp);
    printf("\n-------------\n");
    for (int i = 0; i < size; i  )
        printf("%c", addr[i]);

    return 0;
}

int cmp(const void *p1, const void *p2) {
    return (*(const int*)p1 - *(const int*)p2);
}

//Function to create a txt with random integers in given size
void rand_txt(int size) {
    FILE *fp = fopen("rand.txt", "w");
    srand(time(0));
    for (int i = 0; i < size; i  )
        fprintf(fp, "%d\n", (rand() % size)   1);
    fclose(fp);
}

Also a quickie: Do the arrays and pointers behave in the same way in practice(like strings)? say we declare: int *x = { 10, 20, 30, 40 }; can we just fprint("%d\t", x[i]); or such type of arrays must be declared as int[] = { 10, 20... }; and not with pointers? (I can and will test it myself and edit the "quickie" out if there is no answer until after I am able to test and study it)

CodePudding user response:

You cannot sort a text file with numbers of different sizes by mapping the file in memory and calling qsort() with the comparison function for multiple reasons:

  • qsort needs an array where all entries have the same size, which is not the case of textual representations.
  • the comparison function assumes an array of int, not an array of textual representations
  • even for an array of int, the comparison function would be inappropriate for large absolute values as the subtraction might overflow.

To sort the text file, the classic approach is to allocate an array of int and load the values from the file, converting from text representation to int values, sort the array with qsort with an appropriate comparison function, and write the sorted array converting back to string representations.

Here is a comparison function you can use:

int cmp(const void *p1, const void *p2) {
    const int *i1 = p1;
    const int *i2 = p2;
    return (*i1 > *i2) - (*i1 < *i2);
}

CodePudding user response:

What your code does is what you wrote. There is no qsort "likes", or function "thinks".

You write text line by line, like below. Do notice the end of line characters:

10\n
19\n
9\n

The below function

int cmp(const void *p1, const void *p2)
{
    return (*(const int*)p1 - *(const int*)p2);
}

Takes two pointers, interprets them as int* and compares two ints being pointed to. Type int has typically 32-bit length. I will write your text in one line and see what will cmp function do.

  p1        p2
  |         |
  v         v
 10\n1   9\n9\n

The comparator will get bytes 10\n1, interpret them as an int. Then it will get bytes 9\n9\n, interpret them as an int. And compare those ints.

As others have pointed out in comments section, qsort may access past the data that you wrote to the file. That case, comparator will compare bytes values, that it finds in a mmaped memory.

*int x = {10, 20, 30, 40};

Turn on compiler warnings and see what will it tell you. Maybe try to print x[0], x[1]. Why are you asking that quickey? Do you trip into imaginary C language?

  • Related