I stuck in memory limit exceed because I use a naive method to finish the task which is creating a array with size 2^25. By using the index of the array to represent the timestamp, I can locate the timestamp and add its occurrence in O(1) time, but it waste lots of memory. So I want to know if there are any method that can reduce the memory. The time stamp will group by hour by time_stamp - (time_stamp % 3600), it will minus start_time to avoid index overflow.
How to reduce the memory used and read the timestamp&occurrence quickly? *the input timestamps can in any order
#define max_entry 2 << 25
#define start_time 1645491600
struct pair
{
long timestamp;
int occurrence;
};
struct pair *counter_array = (struct pair *)calloc(max_entry, sizeof(struct pair)); // declear pair array
//readfile
char *temp;
while (fgets(buffer, buffer_size, common_file) != NULL)
{
char *temp;
long time_stamp = strtol(buffer, &temp, 10);
counter_array[time_stamp - (time_stamp % 3600) - start_time].timestamp = time_stamp - (time_stamp % 3600);
counter_array[time_stamp - (time_stamp % 3600) - start_time].occurrence = 1;
}
I know using max_entry is not good, but I really can't figure out how to modify it.
CodePudding user response:
- Since you're rounding
time_stamp
s tohours
, just use
time_stamp_index = (time_stamp - start_time) / 3600;
- Modify your
struct
to store frequency & hours :
struct hour_freq {
int hours;
int freq;
};
- If
time_stamp
can go up toINT_MAX
(231 - 1) i.e.2147483647
. Thenmax_index
will be
max_index = (2147483647 - 1645491600) / 3600 = 139,442
well within memory limits of 50 MiB
. With 45 MiB you can index time_stamp
s upto 22,879,155,600
.
(( 22879155600 − 1645491600 ) ÷ 3600 ) × 8 = 47,185,920 octets = 45 MiB
- To find frequency of a
time_stamp
athour
granularity, calculatetime_stamp_index
and lookup.