Home > Back-end >  Compile Time Dictionary or Array C#
Compile Time Dictionary or Array C#

Time:09-19

Is it possible to create a constant dictionary or array at compile time in C#? If not, is it possible to do this in C or C and compile to a dll that can be pulled into a C# project?

I need to create a dictionary with about 130M members, and generating it reasonably fast is consuming a lot of my time. I imagine the fastest way to do it would be to programmatically generate a huge text file and compile it to a constant array one time and then never pay that price again. I'm just not finding this feature in C#. I feel like I could do this in C.

Edit 1: This takes up about 1.5GB in RAM, if I'm not mistaken. Many useful comments have exposed the fact that it would take up at least this much on disk as an exe, which is probably not deployable. However, I am still interested in the answer for problems that are less enormous.

Edit 2: Understanding the application may answer a lot of the questions. This is a monte carlo simulator. A card is one bit, from 2clubs =0x1 to Aspades = (0x1<<52). A hand is a ulong with 7 bits set, indicating the cards in the hand.

This storage scheme allows me to do bitwise operations in lieu of cumbersome logic. For instance, to detect a flush, I can mask all hearts and then do popcount > 4, then repeat for clubs, diamonds, spades.

Then you have to determine the value of a hand. But if you are in the middle of permuting the cards, this is trivial, because the high card is given by the permutation step. If you are not in the middle of a permutation then determining the rank is much more cumbersome. This is why generating values on demand is undesirable. You have to pay the cost somewhere, but the cost is much higher on the fly than it is up front.

The end goal is to get them into a dictionary, so that determining the better of 2 hands is a dictionary lookup.

The fastest way to do this (that I've come up with) is store all hands/ranks in a tuplet array, then do ToDictionary(). The downside is that arrays obviously have no duplicate checking. This means that, in the flush example, I have to check that there is no straight flush before pushing to the array, whereas in a dictionary, I can just put the straight flushes in first, then try to push the flushes in. So you pay the cost of checking for straight flushes or you pay the cost of pushing 1 item at a time into a dictionary.

The dictionary keys are all possible 7 card poker hands. The values are the relative strength of each hand, e.g., straight flush to ace has value zero.

CodePudding user response:

I'd rather try the following:

Create the dictionary in memory once then serialize it and save it to a file using this.

Then you can simply load the dictionary from the file when your application is starting. That way you won't end up with a 130 MB exe. If you really want it to be part of your executable, you can still embed it as a binary resource.

CodePudding user response:

I made a few measurements to evaluate the cost of initializing an in-memory database of this size.

Using a Dictionary<ulong, int>:

Dictionary<ulong, int> dict = new(130_000_000);
for (int i = 0; i < 130_000_000; i  ) dict.Add((ulong)i, i);

Result:

Duration: 20,030 msec, Allocated: 3,640,002,808 bytes

Using a SortedList<ulong, int>:

SortedList<ulong, int> dict = new(130_000_000);
for (int i = 0; i < 130_000_000; i  ) dict.Add((ulong)i, i);

Result:

Duration: 32,979 msec, Allocated: 1,560,003,920 bytes

Using two arrays (ulong[] and int[]), filling them with pre-sorted data, and intending to use them with the Array.BinarySearch<T> method:

ulong[] keys = new ulong[130_000_000];
int[] values = new int[130_000_000];
for (int i = 0; i < 130_000_000; i  ) keys[i] = (ulong)i;
for (int i = 0; i < 130_000_000; i  ) values[i] = i;

Result:

Duration: 1,707 msec, Allocated: 1,560,000,840 bytes

My PC is quite old and has only 4GB RAM, so these experiments could yield significantly different results in a higher-end machine.

My observations are:

  • The Dictionary has the higher cost regarding the required memory, and the duration of initialization is significant. It should be the most performant option though, for doing searches.

  • The SortedList has low memory cost, but even higher initialization cost. Searching involves binary search, so it should be as performant as the next option.

  • The ulong[]/int[] array pair has low memory cost and low initialization cost. Doing a binary search on a 130,000,000-elements array requires at most 28 ulong comparisons, so it should be slower than the Dictionary, but not terribly slow.

My conclusion is that you might want to invest some time for researching and implementing a custom data structure, or looking for alternative options like in-memory databases. The data structures that are built-in the .NET standard libraries are probably inadequate for such a heavy load, combined with a fast-initialization requirement.

  • Related