Assume I am process a very large text file, I have the following pseudocode
xx_valueList = []
lines=[]
with line in file:
xx_value = calc_xxValue(line)
xx_valueList.append(xx_value)
lines.append(lines)
# get_quantile_value is a function return the cutoff value with a specific quantile precent
cut_offvalue = get_quantile_value(xx_valueList, precent=0.05)
for line in lines:
if calc_xxValue(line) > cut_offvalue:
# do someting here
Note that the file is very large and may come from a pipe, so I don't want to read it twice.
We must read the entire file before we can get the cutoff to filter file
The above method can work, but it consumes too much memory, is there some algorithmic optimization that can improve efficiency and reduce memory consumption?
CodePudding user response:
xx_value_list = []
cut_offvalue = 0
with open(file, 'r') as f:
for line in f:
xx_value = calc_xxValue(line)
xx_value_list.append(xx_value)
if len(xx_value_list) % 100 == 0:
cut_offvalue = get_quantile_value(xx_value_list, precent=0.05)
if xx_value < cut_offvalue:
# do something here
pass