Home > Net >  Best practices for large array of in integers in C
Best practices for large array of in integers in C

Time:08-01

I am loading a 123 MB file of unsigned integers that needs to be in memory (for fast look ups for a monte carlo simulation) in C . Right now I have a global array but i've heard global arrays are frowned upon. What are the best practices for this?

For context, I'm doing Monte Carlo simulations on a poker game and need an array of about 30 million integers to quickly compute the winner of a poker hand. To determine the winner, you first compute the 'handranks' by doing 7 queries of the array. Then to determine the winner, you compare the 'handranks'.

int HR[32487834];

int get_handrank(const std::array<int,7> cards)
{
  int p = 53;
  for (const auto& c: cards)
    p = HR[p   c];
  return p; 
}

int main()
{
  // load the data
  memset(HR, 0, sizeof(HR));
  FILE * fin = fopen("handranks.dat", "rb");
  if (!fin)
    std::cout << "error when loading handranks.dat" << std::endl;
  size_t bytesread = fread(HR, sizeof(HR), 1, fin);
  fclose(fin);
  std::cout << "complete.\n\n";

  // monte carlo simulations using get_handrank() function
  .
  .
  .
}

CodePudding user response:

  1. Use local variables, and pass them to functions as appropriate. This makes the program easier to reason about.

  2. Use vector instead of an array for this large amount of data, otherwise you might cause stack overflow.

  3. Modify your functions to work with std::span instead of a particular container, as this creates more decoupling.

  4. Create a symbolic constant with a meaningful name for 32487834 instead of using it as a magic constant.

CodePudding user response:

Using local variables and passing them around is always a better practice than having globals IMO. It makes your algorithms more flexible. What if you need to use a different array for some reason later on? You would need to modify all the functions using the global variable. Passing around the array is a bit inconvenient and verbose I agree, but still better than the globals. We will address this problem later on.

So your first option is something like:

// Don't forget to receive the params as references to avoid copying
int get_handrank(const std::array<int,7>& cards, const std::array<int, 32487834>& HR)
{
  int p = 53;
  for (const auto& c: cards)
    p = HR[p   c];
  return p; 
}

int main()
{
  std::array<int, 32487834> HR{}; //zero-inits the array
  // or std::vector<int> HR{}; HR.resize(32487834) to avoid stack overflow as @MarkRansom pointed out

  FILE * fin = fopen("handranks.dat", "rb");
  if (!fin)
    std::cout << "error when loading handranks.dat" << std::endl;
  size_t bytesread = fread(HR.data(), HR.size() * sizeof(int), 1, fin);
  fclose(fin);
}

Second approach, even better: Use classes. You can have a simulation class, and the class can have the HR array as a const private member read and initialized in the constructor. Then get_handrank can be a member function and can access the member HR array:

class Simulation {
public: 
  int get_handrank(const std::array<int,7>& cards)
  {
    int p = 53;
    for (const auto& c: cards)
      p = HR[p   c];
    return p; 
  }

  Simulation()
    : HR{} 
    // or HR{readHRFromFileFunction()}
  {
    FILE * fin = fopen("handranks.dat", "rb");
    if (!fin)
      std::cout << "error when loading handranks.dat" << std::endl;
    size_t bytesread = fread(HR.data(), HR.size() * sizeof(int), 1, fin);
    fclose(fin);
  }

private:
  std::array<int, 32487834> HR;
  //or const std::array<int, 32487834> HR; if you use a function to init it
}
  • Related