Home > Mobile >  How to get the frequency of each time stamp during read file and reduce the memory
How to get the frequency of each time stamp during read file and reduce the memory

Time:04-12

I stuck in memory limit exceed because I use a naive method to finish the task which is creating a array with size 2^25. By using the index of the array to represent the timestamp, I can locate the timestamp and add its occurrence in O(1) time, but it waste lots of memory. So I want to know if there are any method that can reduce the memory. The time stamp will group by hour by time_stamp - (time_stamp % 3600), it will minus start_time to avoid index overflow.

How to reduce the memory used and read the timestamp&occurrence quickly? *the input timestamps can in any order

#define max_entry 2 << 25
#define start_time 1645491600
struct pair
{
    long timestamp;
    int occurrence;
};

struct pair *counter_array = (struct pair *)calloc(max_entry, sizeof(struct pair)); // declear pair array

//readfile
char *temp;
 
while (fgets(buffer, buffer_size, common_file) != NULL)
{
     char *temp;
     long time_stamp = strtol(buffer, &temp, 10);                                                               
     counter_array[time_stamp - (time_stamp % 3600) - start_time].timestamp = time_stamp - (time_stamp % 3600);
     counter_array[time_stamp - (time_stamp % 3600) - start_time].occurrence  = 1;                           
}

I know using max_entry is not good, but I really can't figure out how to modify it.

CodePudding user response:

  1. Since you're rounding time_stamps to hours, just use
time_stamp_index = (time_stamp - start_time) / 3600;
  1. Modify your struct to store frequency & hours :
struct hour_freq {
    int hours;
    int freq;
};
  1. If time_stamp can go up to INT_MAX (231 - 1) i.e. 2147483647. Then max_index will be
max_index = (2147483647 - 1645491600) / 3600 = 139,442

well within memory limits of 50 MiB. With 45 MiB you can index time_stamps upto 22,879,155,600.

(( 22879155600 − 1645491600 ) ÷ 3600 ) × 8 = 47,185,920 octets = 45 MiB
  1. To find frequency of a time_stamp at hour granularity, calculate time_stamp_index and lookup.
  • Related