This question is kind of long-detailed and took a lot of my time; hope you read it till end; thanks for your patience
I've find out that size of data in some programming languages like python
isn't really matter, and they can store very big values like huge numbers; and then increase them to something even bigger.
I work with c
and I know in that language, variables must have a particular type and size. To store huge integers (which is very bigger than the largest type in c), I know 2 different ways:
First method: Using char pointer and store each digit at particular index
- First value of pointer can tell us the sign.
- Pointer can stop at particular value (like
\0
in strings but now we use$
).
to store 123456789 pointer would be like this:
| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9| $ |
^ ^
| |
sign finisher
Second method: Using unsigned int pointer and store numbers like computers store binaries (I mean store them based on size of int)
- First value of pointer can tell us the sign
- Second value of pointer can tell us the length
to store 123456789 pointer would be like this:
// 123456789 MOD 2^32 = 123456789
| | 1 |123456789|
^ ^ ^
| | |
sign length value
and to store something bigger like 9998877665544332211 the pointer would be as in below:
// 9998877665544332211 / 2^32 = 2328045122
// 9998877665544332211 MOD 2^32 = 2942002099
| | 2 |2942002099|2328045122|
^ ^ ^ ^
| | | |
sign length ---------
|
values
For simplification I don't use only first value of pointer to keep both sign and size.
issues:
If I use First method, it will waste a lot of memory (e.g. for 18-digit First method needs 20 bytes and Second method needs 16 bytes; the difference will be greater in bigger values).
Second method needs more calculation if we want print it as decimal.
Questions:
Which method you suggest me to use?
Which method is more similar to that one languages like python use?
Is there any better method?
PS: Yeah, using struct
can make it easier, but that's not my point; I just wanna know storing each digit separately is better than storing number based on int's size or not.
Please don't refer me to external libraries like GMP
. I wanna know what such libraries internally do.
CodePudding user response:
Neither is ideal. For both storage and speed the best way (in a generic context) is having your number binary encoded in 2s complement in an array of bits grouped by the largest data size of the underlying architecture (e.g. unsigned long long
for x64 or unsigned int
for x86
).
more calculation if we want print it as decimal.
Unless you have a very specific use case, most of the time is spent in calculations between bignums. To and from decimal conversion is usually done only on IO which is slow anyway so it doesn't matter that much. Slower conversion to and from decimal is an acceptable tradeoff in favor of bignum calculation performance.
I definitely don't recommend building your own bignum library except for academic purposes (which can be an excellent exercise if you are into this). Beyond difficulty of implementation, making it fast requires deep knowledge and vast experience in system architectures, operating systems and compilers. Just use an existing proven library like bignum, tiny-bignum, GMP etc.
wanna know what such libraries internally do.
Most of them are open source so ... go look around!
CodePudding user response:
What you're looking for is arbitrary precision arithmetic or "bignum". It rapidly gets complicated to do it efficiently and is way too much to explain in an answer here.
Is there any better method?
Yes, use an existing library.
Which method is more similar to that one languages like python use?
libmpedc is what Python uses. Ruby will use GMP if available.
As an example, this is the decimal struct from libmpedc.
#include <mpdecimal.h>
typedef struct mpd_t {
uint8_t flags; // [memory flags] | [specials] | sign
mpd_ssize_t exp; // exponent
mpd_ssize_t digits; // number of decimal digits
mpd_ssize_t len; // number of words in use
mpd_ssize_t alloc; // number of allocated words
mpd_uint_t *data; // words
} mpd_t;
It's using unsigned integers, but also a lot of other tricks.
I would suggest poking around in their source code to learn more. Particularly mpd_set_string
(creating a decimal from a string) and the basic arithmetic functions.