How to get the frequency of each time stamp during read file and reduce the memory-CodePudding

I stuck in memory limit exceed because I use a naive method to finish the task which is creating a array with size 2^25. By using the index of the array to represent the timestamp, I can locate the timestamp and add its occurrence in O(1) time, but it waste lots of memory. So I want to know if there are any method that can reduce the memory. The time stamp will group by hour by time_stamp - (time_stamp % 3600), it will minus start_time to avoid index overflow.

How to reduce the memory used and read the timestamp&occurrence quickly? *the input timestamps can in any order

#define max_entry 2 << 25
#define start_time 1645491600
struct pair
{
    long timestamp;
    int occurrence;
};

struct pair *counter_array = (struct pair *)calloc(max_entry, sizeof(struct pair)); // declear pair array

//readfile
char *temp;
 
while (fgets(buffer, buffer_size, common_file) != NULL)
{
     char *temp;
     long time_stamp = strtol(buffer, &temp, 10);                                                               
     counter_array[time_stamp - (time_stamp % 3600) - start_time].timestamp = time_stamp - (time_stamp % 3600);
     counter_array[time_stamp - (time_stamp % 3600) - start_time].occurrence  = 1;                           
}

I know using max_entry is not good, but I really can't figure out how to modify it.

CodePudding user response：

Since you're rounding time_stamps to hours, just use

time_stamp_index = (time_stamp - start_time) / 3600;

Modify your struct to store frequency & hours :

struct hour_freq {
    int hours;
    int freq;
};

If time_stamp can go up to INT_MAX (2³¹ - 1) i.e. 2147483647. Then max_index will be

max_index = (2147483647 - 1645491600) / 3600 = 139,442

well within memory limits of 50 MiB. With 45 MiB you can index time_stamps upto 22,879,155,600.

(( 22879155600 − 1645491600 ) ÷ 3600 ) × 8 = 47,185,920 octets = 45 MiB

To find frequency of a time_stamp at hour granularity, calculate time_stamp_index and lookup.