I have a loop that writes out processed data to a file. The data is contained in a linked list and is written out in a while loop which iterates the list.
My question is, should I open/close the write-out files before/after the while loop as shown below, or should I move the open/close within the loop so each iteration opens and then closes the file rather than keeping the file open for the entire duration of the loop?
The linked list has hundreds of MB of data, potentially several GB. Each iteration writes out a line which isn't much more than 80 characters. This is run on a modern linux system.
EDIT: this is not an embedded/critical system. In the unlikely event the process is interrupted before closing the file, it will be restarted.
void msgs_output_to_file(Node *head) {
Node *l = head;
MsgType msg_type;
char line[MAX_LINE_SIZE];
FILE *file1 = NULL;
FILE *file2 = NULL;
file1 = fopen("file1.csv", "w");
file2 = fopen("file2.csv", "w");
while (l && l->data) {
memset(line, 0, (size_t) MAX_LINE_SIZE);
sprintf_msg(line, l->data);
line[strlen(line)] = '\n'; // add new line
msg_type = get_msg_type(l->data);
switch (msg_type) {
case TYPE_1:
fputs(line, file1);
break;
case TYPE_2:
fputs(line, file2);
break;
default:
break;
}
l = l->next;
}
fclose(file1);
fclose(file2);
}
CodePudding user response:
The general rule is that you should open the file only once (so out of the loop) and keep it opened for the duration of the whole loop. The rationale is that opening a file is a rather expensive operation.
But (as for any general rule) there are exceptions... What matters is what happens is your program experiences a crash in the middle of the loop. If the consequences are only that you will have to restart the job and if the probability is low, just move on. If you are processing mission critical data that will be lost if the resulting file ends to be broken and if (whatever the cause) crashes are to be expected, then things will be different. You will have to handle a balance between performance (only one open/close) and robustness. At least you should flush the file every n rows (n being the max rows you can accept to lose) to minimize the possible data loss.
CodePudding user response:
What I understand about opening a file (in Python) is that you do not load the entire file in memory, but rather receive a file handle. Using this file handle, your script has access to "high level" operations like reading and writing a line. When writing a line to the file, the file contents is not directly written to the hard disk. Instead the contents is placed in a buffer and the contents is actually written to the file by the operating system when the operating system feels like doing so, or when you close the file using fclose().
The advantage of this buffer system is that when you are writing to a file in a loop, the amount of write operations to the hard disk is limited, since the content is not directly written to the disk but rather to the buffer. The buffer contents is then written to the disk when enough contents is in the buffer or when you close the file. If you would open and close the file each iteration you would increase the number of write operations to the disk and increase overhead.
So the short answer is, I would open the files before the loop and close the files after the loop.