For some functions for string manipulation, I try to rewrite the function output onto the original string. I came up with the general scheme of
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *char_repeater(char *str, char ch)
{
int tmp_len = strlen(str) 1; // initial size of tmp
char *tmp = (char *)malloc(tmp_len); // initial size of tmp
// the process is normally too complicated to calculate the final length here
int j = 0;
for (int i = 0; i < strlen(str); i )
{
tmp[j] = str[i];
j ;
if (str[i] == ch)
{
tmp[j] = str[i];
j ;
}
if (j > tmp_len)
{
tmp_len *= 2; // growth factor
tmp = realloc(tmp, tmp_len);
}
}
tmp[j] = 0;
char *output = (char *)malloc(strlen(tmp) 1);
// output matching the final string length
strncpy(output, tmp, strlen(tmp));
output[strlen(tmp)] = 0;
free(tmp); // Is it necessary?
return output;
}
int main()
{
char *str = "This is a test";
str = char_repeater(str, 'i');
puts(str);
free(str);
return 0;
}
Although it works on simple tests, I am not sure if I am on the right track.
- Is this approach safe overall?
- Of course, we do not re-write the string. We simply write new data (array of the characters) at the same pointer. If
output
is longer thanstr
, it will rewrite the data previously written atstr
, but ifoutput
is shorter, the old data remains, and we would have a memory leak. How can wefree(str)
within the function before outputting to its pointer?
CodePudding user response:
A pair of pointers can be used to iterate through the string.
When a matching character is found, increment the length.
Allocate output
as needed.
Iterate through the string again and assign the characters.
This could be done in place if str
was malloced in main
.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *char_repeater(char *str, char ch)
{
int tmp_len = strlen(str) 1; // initial size of tmp
char *find = str;
while ( *find) // not at terminating zero
{
if ( *find == ch) // match
{
tmp_len ; // add one
}
find; // advance pointer
}
char *output = NULL;
if ( NULL == ( output = malloc(tmp_len)))
{
fprintf ( stderr, "malloc peoblem\n");
exit ( 1);
}
// output matching the final string length
char *store = output; // to advance through output
find = str; // reset pointer
while ( *find) // not at terminating zero
{
*store = *find; // assign
if ( *find == ch) // match
{
store; // advance pointer
*store = ch; // assign
}
store; // advance pointer
find;
}
*store = 0; // terminate
return output;
}
int main()
{
char *str = "This is a test";
str = char_repeater(str, 'i');
puts(str);
free(str);
return 0;
}
CodePudding user response:
For starters the function should be declared like
char * char_repeater( const char *s, char c );
because the function does not change the passed string.
Your function is unsafe and inefficient at least because there are many dynamic memory allocations. You need to check that each dynamic memory allocation was successful. Also there are called the function strlen
also too ofhen.
Also this code snippet
tmp[j] = str[i];
j ;
if (str[i] == ch)
{
tmp[j] = str[i];
j ;
}
if (j > tmp_len)
//...
can invoke undefined behavior. Imagine that the source string contains only one letter 'i'
. In this case the variable tmp_len
is equal to 2
. So temp[0]
will be equal to 'i'
and temp[1]
also will be equal to 'i'
. In this case j
equal to 2
will not be greater than tmp_len
. As a result this statement
tmp[j] = 0;
will write outside the allocated memory.
And it is a bad idea to reassign the pointer str
char *str = "This is a test";
str = char_repeater(str, 'i');
As for your question whether you need to free the dynamically allocated array tmp
free(tmp); // Is it necessary?
then of course you need to free it because you allocated a new array for the result string
char *output = (char *)malloc(strlen(tmp) 1);
And as for your another question
but if output is shorter, the old data remains, and we would have a memory leak. How can we free(str) within the function before outputting to its pointer?
then it does not make a sense. The function creates a new character array dynamically that you need to free and the address of the allocated array is assigned to the pointer str
in main that as I already mentioned is not a good idea.
You need at first count the length of the result array that will contain duplicated characters and after that allocate memory only one time.
Here is a demonstration program.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char * char_repeater( const char *s, char c )
{
size_t n = 0;
for ( const char *p = s; ( p = strchr( p, c ) ) != NULL; p )
{
n;
}
char *result = malloc( strlen( s ) 1 n );
if ( result != NULL )
{
if ( n == 0 )
{
strcpy( result, s );
}
else
{
char *p = result;
do
{
*p = *s;
if (*s == c ) *p = c;
} while ( *s );
}
}
return result;
}
int main( void )
{
const char *s = "This is a test";
puts( s );
char *result = char_repeater( s, 'i' );
if ( result != NULL ) puts( result );
free( result );
}
The program output is
This is a test
Thiis iis a test
CodePudding user response:
My kneejerk reaction is to dislike the design. But I have reasons.
First, realloc()
is actually quite efficient. If you are just allocating a few extra bytes every loop, then chances are that the standard library implementation simply increases the internal bytecount value associated with your memory. Caveats are:
Interleaving memory management.
Your function here doesn’t have any, but should you start calling other routines then keeping track of all that becomes an issue. Anything that calls other memory management routines can lead to the next problem:Fragmented memory.
If at any time the available block is too small for your new request, then a much more expensive operation to obtain more memory and copy everything over becomes an issue.
Algorithmic issues are:
- Mixing memory management in increases the complexity of your code.
- Every occurrence of
c
invokes a function call with potential to be expensive. You cannot control when it is expensive and when it is not. - Worst-case options (
char_repeater( "aaaaaaaaaa", 'a' )
) trigger worst-case potentialities.
My recommendation is to simply make two passes.
This passes several smell tests:
- Algorithmic complexity is broken down into two simpler parts:
- counting space required, and
- allocating and copying.
- Worst-case scenarios for allocation/reallocation are reduced to a single call to
malloc()
. - Issues with very large strings are reduced:
- You need at most space for 2 large strings (not 3, possibly repeated)
- Page fault / cache boundary issues are similar (or the same) for both methods
Considering there are no real downsides to using a two-pass approach, I think that using a simpler algorithm is reasonable. Here’s code:
#include <stdio.h>
#include <stdlib.h>
char * char_repeater( const char * s, char c )
{
// FIRST PASS
// (1) count occurances of c in s
size_t number_of_c = 0;
const char * p = s;
while (*p) number_of_c = (*p == c);
// (2) get strlen s
size_t length_of_s = p - s;
// SECOND PASS
// (3) allocate space for the resulting string
char * dest = malloc( length_of_s number_of_c 1 );
// (4) copy s -> dest, duplicating every occurance of c
if (dest)
{
char * d = dest;
while (*s)
if ((*d = *s ) == c)
*d = c;
*d = '\0';
}
return dest;
}
int main(void)
{
char * s = char_repeater( "Hello world!", 'o' );
puts( s );
free( s );
return 0;
}
As always, know your data
Whether or not a two-pass approach actually is better than a realloc()
approach depends on more factors than what is evident in a posting on the internet.
Nevertheless, I would wager that for general purpose strings that this is a better choice.
But, even if it isn’t, I would argue that a simpler algorithm, splitting tasks into trivial sub-tasks, is far easier to read and maintain. You should only start making tricky algorithms only if you have use-case profiling saying you need to spend more attention on it.
Without that, readability and maintainability trumps all other concerns.