I have written the following code, modified it a bit for simplicity:
FILE *sysfs_file = fopen("/sys/file", "rb");
if (sysfs_file != NULL){
/* Loop over file handler until EOF to get filesize in bytes */
FILE *sysfs_file_get_size = fopen("/sys/file", "rb");
char d = fgetc(sysfs_file_get_size);
int filesize = 0;
while (d != EOF){
d = fgetc(sysfs_file_get_size);
filesize ;
}
fclose(sysfs_file_get_size);
/* Allocate buffer and copy file into it */
char *buf = malloc(filesize);
char c = fgetc(sysfs_file);
for (int i = 0; i < filesize; i )
{
buf[i] = c;
c = fgetc(sysfs_file);
}
fclose(sysfs_file);
if(strstr(buf, "foo")) {
printf("bar.\n");
}
}
For security reasons, it seemed better to not assume what size the file will be, and first loop through the file to check of how many bytes it consists.
Regular methods of checking the filesize like fseek()
or stat()
do not work, as the kernel generates the file at the moment that it is being read. What I would like to know: is there a way of reading the file into a buffer in a secure manner, without having to open a file handler twice?
CodePudding user response:
First of all, in the line
FILE *sysfs_file = fopen("/sys/file", "rb");
the "rb"
mode does not make sense. If, as you write, you are looking for a "string", then the file is probably a text file, not a binary file. In that case, you should use "r"
instead.
If you are using a POSIX-compliant platform (e.g. Linux), then there is no difference between text mode and binary mode. In that case, it makes even less sense to specifically ask for binary mode, when the file is a text file (even though it is not wrong).
For security reasons, it seemed better to not assume what size the file will be and first loop through the file to check of how many bytes it consists.
It is not a security issue if you limit the number of bytes read to the size of the allocated memory buffer, i.e. to the number of bytes the file originally had. That way, the file will only be truncated (which is generally not a security issue).
However, if you want to ensure that the file is not truncated, then it would probably be best to ignore the initial size of the file and to simply attempt to read as much from the file as possible, until you encounter end-of-file. If the initial buffer it not large enough to store the entire file, then you can use the function realloc
to resize the buffer.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//This function will return a pointer to a dynamically
//allocated memory buffer which contains the file data as
//a string (i.e. that is terminated by a null character).
//The function "free" should be called on this data when it
//is no longer required.
char *create_buffer_with_file_data_as_string( FILE *fp )
{
char *buffer = NULL;
size_t buffer_size = 16384;
size_t valid_bytes_in_buffer = 0;
for (;;) //infinite loop, equivalent to while(1)
{
size_t bytes_to_read, bytes_read;
char *temp;
//(re)allocate buffer to desired size
temp = realloc( buffer, buffer_size );
if ( temp == NULL )
{
fprintf( stderr, "Realloc error!\n" );
free( buffer );
return NULL;
}
//(re)allocation was successful, so we can overwrite the
//pointer "buffer"
buffer = temp;
//calculate number of bytes to read from input file
//note that we must leave room for adding the terminating
//null character
bytes_to_read = buffer_size - valid_bytes_in_buffer - 1;
//attempt to fill buffer as much as possible with data from
//the input file
bytes_read = fread(
buffer valid_bytes_in_buffer,
1,
bytes_to_read,
fp
);
//break out of loop if there is no data to process
if ( bytes_read == 0 )
break;
//update number of valid bytes in the buffer
valid_bytes_in_buffer = bytes_read;
//double the size of the buffer (will take effect in
//the next loop iteration
buffer_size *= 2;
}
//verify that no error occurred
if ( ferror( fp ) )
{
fprintf( stderr, "File I/O error occurred!" );
free( buffer );
return NULL;
}
//add terminating null character to data, so that it is a
//valid string that can be passed to the functon "strstr"
buffer[valid_bytes_in_buffer ] = '\0';
//shrink buffer to required size
{
char *temp;
temp = realloc( buffer, valid_bytes_in_buffer );
if ( temp == NULL )
{
fprintf( stderr, "Warning: Shrinking failed!\n" );
}
else
{
buffer = temp;
}
}
//the function was successful, so return a pointer to
//the data
return buffer;
}
int main( void )
{
FILE *fp;
char *data;
//attempt to open file
fp = fopen( "filename", "r" );
if ( fp == NULL )
{
fprintf( stderr, "Error opening file!\n" );
exit( EXIT_FAILURE );
}
//call the function
data = create_buffer_with_file_data_as_string( fp );
if ( data == NULL )
{
fprintf(
stderr,
"An error occured in the function:\n"
" create_buffer_with_file_data_as_string\n"
);
fclose( fp );
exit( EXIT_FAILURE );
}
//the file is no longer needed, so close it
fclose( fp );
//search data for target string
if( strstr( data, "target" ) != NULL )
{
printf("Found \"target\".\n" );
}
else
{
printf("Did not find \"target\".\n" );
}
//cleanup
free( data );
}
For the input
This is a test file with a target.
this program has the following output:
Found "target".
Note that every time I am calling realloc
, I am doubling the size of the buffer. I am not adding a constant amount to the size of the buffer. This is important, for the following reason:
Let's say that the file has a size of 160 MB (megabytes). In my program, I have an initial buffer size of about 16 KB (kilobytes). If I didn't double the size of the buffer every time I call realloc
, but instead added a constant amount of bytes, for example added another 16 KB, then I would need to call realloc
10,000 times. Every time I call realloc
, the content of the entire buffer may have to be copied by realloc
, which means that on average, 80 MB may have to be copied every time, which is 800 GB (nearly a terabyte) in total. This would be highly inefficient.
However, if I instead double the size of the memory buffer (i.e. let the buffer grow exponentially), then it is guaranteed that the amount of data that must be copied will never be more than double the amount of the actual data. So, in my example above, it is guaranteed that never more than 320 MB will have to be copied by realloc
.
CodePudding user response:
You could just estimate what you need in blocks and grow the input buffer as needed...
This is untested, but gives the flavour of what should work.
FILE *fp = fopen( "/sys/file", "rb" );
if( fp == NULL )
return -1;
# define BLK_SIZE 1024
char *buf = malloc( BLK_SIZE );
if( buf == NULL )
return -1;
char *readTo = buf;
size_t bufCnt = 0;
for( ;; ) {
size_t inCnt = fread( readTo, sizeof *readTo, BLK_SIZE, fp );
bufCnt = inCnt;
if( inCnt < BLK_SIZE )
break;
// possibly test for EOF here
char *tmp = realloc( buf, bufCnt BLK_SIZE );
if( tmp == NULL )
return -1;
buf = tmp;
readTo = buf bufCnt;
}
fclose( fp );
printf( "Got %ld valid bytes in buffer\n", bufCnt );
// do stuff with *buf
free( buf );
EDIT:
In light of the sharp observation by @Andreas Wenzel, the objective is to find 'f', 'o', 'o', '\0' in the open file... This should do the trick:
char *srchfor = "foo";
char *found = srchfor; // at the beginning
FILE *fp = fopen( "/sys/file", "rb" );
if( fp == NULL )
// deal with this appropriately
char buf[ 4096 ]; // some convenient "page" size
size_t inCnt = 1; // not zero
while( *found && inCnt > 0 ) {
inCnt = fread( buf, sizeof buf[0], sizeof buf, fp );
for( size_t i = 0; *found && i < inCnt; i )
found = (buf[i] == *found) ? found 1 : srchfor; // incr or reset
}
fclose( fp );
if( *found == '\0' )
printf( "bar.\n" );
CodePudding user response:
If the kernel is creating the file as you read it and there is a risk that the size of it will be different the next time you read it, then your only real bet is to read it into a buffer before you know how large the file is. Start by allocating a LARGE buffer - big enough that it SHOULD accept the entire file - then call read() to get (at most) that many bytes. If there's still more to be read, you can realloc() the buffer you were writing into. Repeat the realloc() as often as necessary.