How do I open a mmap of size of a TB-CodePudding

I need to open a huge memory map. The file is one terabyte.

I however am getting an errno: ENOMEM 12 Cannot allocate memory. I don't get what is holding me up. Requesting the RLIMIT_AS results in the values: 18446744073709551615. Which is enough. My system is also 64 bit so it is not that my virtual memory is too small. ulimit -v is ulimited

I created the data with python using np.lib.format.open_memmap thus it is physically possible. I'm trying to read it in C. Python reading is no problem, numpy.load('terabytearray.npy', mmap_mode='r') works.

Here is a minimal example.

Create a numpy array as such:

import numpy as np

shape = (75000, 5000000)
filename = 'datafile.obj'

if __name__ == '__main__':
  arr = np.lib.format.open_memmap(filename, mode='w ', dtype=np.float32, shape=shape)

read it as such:

#include <stdbool.h>
#include <assert.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>

#include <sys/time.h>
#include <sys/resource.h>

#include <stdio.h>
#include <errno.h>

typedef enum {
  CNPY_LE, /* little endian (least significant byte to most significant byte) */
  CNPY_BE, /* big endian (most significant byte to least significant byte) */
  CNPY_NE, /* no / neutral endianness (each element is a single byte) */
  /* Host endianness is not supported because it is an incredibly bad idea to
     use it for storage. */
} cnpy_byte_order;

typedef enum {
  CNPY_B = 0, /* We want to use the values as index to the following arrays. */
  CNPY_I1,
  CNPY_I2,
  CNPY_I4,
  CNPY_I8,
  CNPY_U1,
  CNPY_U2,
  CNPY_U4,
  CNPY_U8,
  CNPY_F4,
  CNPY_F8,
  CNPY_C8,
  CNPY_C16,
} cnpy_dtype;

typedef enum {
  CNPY_C_ORDER,       /* C order (row major) */
  CNPY_FORTRAN_ORDER, /* Fortran order (column major) */
} cnpy_flat_order;

typedef enum {
  CNPY_SUCCESS,      /* success */
  CNPY_ERROR_FILE,   /* some error regarding handling of a file */
  CNPY_ERROR_MMAP,   /* some error regarding mmaping a file */
  CNPY_ERROR_FORMAT, /* file format error while reading some file */
} cnpy_status;

#define CNPY_MAX_DIM 4
typedef struct {
  cnpy_byte_order byte_order;
  cnpy_dtype dtype;
  cnpy_flat_order order;
  size_t n_dim;
  size_t dims[CNPY_MAX_DIM];
  char *raw_data;
  size_t data_begin;
  size_t raw_data_size;
} cnpy_array;

cnpy_status cnpy_open(const char * const fn, bool writable, cnpy_array *arr) {
  assert(arr != NULL);

  cnpy_array tmp_arr;

  /* open, mmap, and close the file */
  int fd = open(fn, writable? O_RDWR : O_RDONLY);
  if (fd == -1) {
    return CNPY_ERROR_FILE;
  }
  size_t raw_data_size = (size_t) lseek(fd, 0, SEEK_END);
  lseek(fd, 0, SEEK_SET);
  printf("%lu\n", raw_data_size);
  if (raw_data_size == 0) {
    close(fd); /* no point in checking for errors */
    return CNPY_ERROR_FORMAT;
  }
  if (raw_data_size == SIZE_MAX) {
    /* This is just because the author is too lazy to check for overflow on every pos 1 calculation. */
    close(fd);
    return CNPY_ERROR_FORMAT;
  }

  void *raw_data = mmap(
    NULL,
    raw_data_size,
    PROT_READ | PROT_WRITE,
    writable? MAP_SHARED : MAP_PRIVATE,
    fd,
    0 
  );

  if (raw_data == MAP_FAILED) {
    close(fd);
    return CNPY_ERROR_MMAP;
  }

  if (close(fd) != 0) {
    munmap(raw_data, raw_data_size);
    return CNPY_ERROR_FILE;
  }

  /* parse the file */
  // cnpy_status status = cnpy_parse(raw_data, raw_data_size, &tmp_arr); // library call ignore
  // if (status != CNPY_SUCCESS) {
  //   munmap(raw_data, raw_data_size);
  //   return status;
  // }
  // *arr = tmp_arr;

  return CNPY_SUCCESS;
}

int main(){

  cnpy_array arr = {};
  cnpy_status status = cnpy_open("datafile.obj", false, &arr);

  printf("status %i\n",(int) status);
  if(status != CNPY_SUCCESS){
    printf("failure\n");
    printf("errno %i\n", errno);
  }


    struct rlimit lim;
  printf("getrlimit RLIMIT_AS %s\n", (getrlimit(RLIMIT_AS, &lim) == 0 ? "success" : "failure") );
  printf("lim.rlim_cur %lu\n", lim.rlim_cur );
  printf("lim.rlim_max %lu\n", lim.rlim_max );
  printf("RLIM_INFINITY; %lu\n", RLIM_INFINITY );


  return 0;
}

compile with

gcc -std=c11 -o mmap_testing main.c

I'm using ~quf/cnpy library, I included the relevant parts, to make it work with the numpy stuff.

CodePudding user response：

Memory-mapping a file read-only with mmap(... PROT_READ | PROT_WRITE ,MAP_PRIVATE, ...) will result in the need to reserve anonymous backing store (or swap space) for the mmap() operation. This is because an application could modify any or even all of the data after it's mapped, which the kernel must then have somewhere to put it (backing store or swap space) if it needs to be swapped out.

If a file is mapped with read/write permissions and mmap(..., MAP_SHARED, ...), the file itself is the "backing store" because if any data needs to be swapped out, it can be written to the file itself. Thus a MAP_SHARED mapping does not need any swap space reservations.

It's also theoretically possible for a mmap(..., PROT_READ, MAP_PRIVATE, ...) mapping to be done without a need to reserve swap space, as the data can't be modified by the process and could just be reread from the file if it needs to be swapped out, but that depends on the interpretation of what "read-only" means - the data in the file when it was first read from disk, or whatever it might be later if it needs to be reread after being swapped out.

Neither Linux:

It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.

nor POSIX specify the behavior:

It is unspecified whether modifications to the underlying object done after the MAP_PRIVATE mapping is established are visible through the MAP_PRIVATE mapping.

The Linux man page does state the following for MAP_PRIVATE:

Create a private copy-on-write mapping. Updates to the mapping are not visible to other processes mapping the same file

The use of the phrase "copy-on-write mapping" implies that Linux will always reserve swap space for a MAP_PRIVATE mapping (but this remains untested...).

MAP_NORESERVE

Note also, Linux provides the MAP_NORESERVE option:

Do not reserve swap space for this mapping. When swap space is reserved, one has the guarantee that it is possible to modify the mapping. When swap space is not reserved one might get SIGSEGV upon a write if no physical memory is available. See also the discussion of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before 2.6, this flag had effect only for private writable mappings.

CodePudding user response：

The problem is the setting of /proc/sys/vm/overcommit_memory. Full explanation to be found in this answer.

Esentially the kernel uses a heuristic instead of the theoretical idea and thus disallows it. Setting the value to 1 fixes it.