I need to seperate some sentences. For example, the txt file is something like:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Donec commodo metus sit amet mauris facilisis, fringilla convallis erat dictum.
Quisque scelerisque turpis hendrerit, sodales erat et, convallis nisl.
Etiam ultrices vulputate purus, id tincidunt purus semper vel.
There are many blocks (as blocks, I mean two sentences in a row) so I can not manually seperate them. I need to seperate them by blanks between them. However, fgets works line by line so it would give me;
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Donec commodo metus sit amet mauris facilisis, fringilla convallis erat dictum.
Quisque scelerisque turpis hendrerit, sodales erat et, convallis nisl.
Etiam ultrices vulputate purus, id tincidunt purus semper vel.
What should I do? I think about, yet no starting point. Thanks for any help.
Edit: Since many people did not understand that, I see I was unclear. So the point is that, from txt file above, I need to separate those sentences by blanks and add those sentences to an array (array of strings in this case).
So, when this process is done, arrayofstrings[0] must give us
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Donec commodo metus sit amet mauris facilisis, fringilla convallis erat dictum.
Then next index should be similar to this. One of the problems is that I can not be sure that those multi-sentences in indexes are not always constructed by two sentences. I mean, for an index i, arrayofstrings[i] can be as:
Ut mattis mi ac purus tempor bibendum.
Praesent sed metus enim.
Pellentesque at orci id mauris consectetur consequat.
So process can not be done depending on the two-line idea.
CodePudding user response:
Your read to capture each continual block of text (including the embedded '\n'
) characters can be done in a number of ways, but one of the simplest approaches is keep a simple flag that tracts the state of whether you are in between paragraphs reading whitespace, or in a block reading text. (the flag is a simple state-variable)
Then it's just a matter of reading each line and, if it's part of a block, appending each line in the block to a single index in your array, or if it's a blank line, advance to the next index and reset your variables to prepare for reading the next block. If using a fixed-size array, don't forget to protect your array bounds by checking each new line appended at an index will fit. A rough outline would be:
- (for fixed array of strings) declare an array of rows and columns with the columns sufficient to hold each block of text.
- start with your read-state variable set to
0
(false) indicating you are before or between lines of text. - while your array isn't full, read each line.
- if the line contains only a
'\n'
character,- check your flag to determine if you were reading text before this line, if so you are done filling the array index,
- advance the index to next,
- reset your flag
0
, and - reset number of chars used at index to
0
.
- check your flag to determine if you were reading text before this line, if so you are done filling the array index,
- the only other alternative (the
else
) part is that you read a line containing text that is part of a block. Here you would:- compute the total number of bytes needed by what is currently stored at the index, plus the length of the new line (plus
1
for the nul-terminating character). - if line with fit in index,
- append current line to index
- update total characters stored in index
- otherwise (
else
) line won't fit in index, handle the error
- compute the total number of bytes needed by what is currently stored at the index, plus the length of the new line (plus
- set in block flag to
1
(true)
Now obviously rather than using a fixed array, you can either use an array of pointers and allocate storage for each index as needed, or you can use a pointer-to-pointer and allocate both pointers-as-needed and storage for each line. Up to you.
Turning the outline into a short example that uses the inblk
variables as your flag to determine if you are in a block reading lines, or before or between blocks, and using the offset
in each index to track the current number of characters used to protect the fixed array bounds, you could do:
#include <stdio.h>
#include <string.h>
#define NROWS 128 /* max number of rows (sentences) in array */
#define MAXCHR 256 /* max number of chars in read-buffer */
int main (int argc, char **argv) {
char buf[MAXCHR] = "", /* buffer to hold each line */
array[NROWS][MAXCHR] = {""}; /* array of strings */
int inblk = 0, /* flag - in block reading text */
ndx = 0, /* array index */
offset = 0; /* offset in index to write string */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* while array not full, read line into buf */
while (ndx < NROWS && fgets (buf, MAXCHR, fp)) {
if (*buf == '\n') { /* 1st char is \n ? */
if (inblk) { /* if in block ? */
ndx = 1; /* end of block, advance index */
}
inblk = 0; /* reset flag 0 (false) */
offset = 0; /* reset offset */
}
else { /* otherwise reading line in block */
int reqd = offset strlen (buf) 1; /* get total required chars */
if (reqd < MAXCHR) { /* line will fit in array */
strcat (array[ndx], buf); /* append buf to index */
offset = strlen (array[ndx]); /* update offset to end */
}
else { /* line won't fit in remaining space, handle error */
fputs ("error: line exceeds storage for array.\n", stderr);
return 1;
}
inblk = 1; /* set in block flag 1 (true) */
}
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (int i = 0; i < ndx; i ) { /* output reults */
printf ("array[-]:\n%s\n", i, array[i]);
}
}
Example Input File
Given your description of an inconsistent number of lines per-block and potentially inconsistent number of empty lines between the blocks, the following was used:
$ cat dat/blks.txt
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Donec commodo metus sit amet mauris facilisis, fringilla convallis erat dictum.
Quisque scelerisque turpis hendrerit, sodales erat et, convallis nisl.
Etiam ultrices vulputate purus, id tincidunt purus semper vel.
Ut mattis mi ac purus tempor bibendum.
Praesent sed metus enim.
Pellentesque at orci id mauris consectetur consequat.
Example Use/Output
Providing the filename to read as the first argument to the program (or redirecting the file on stdin
to the program) would result in the following:
$ ./bin/combineblks dat/blks.txt
array[ 0]:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Donec commodo metus sit amet mauris facilisis, fringilla convallis erat dictum.
array[ 1]:
Quisque scelerisque turpis hendrerit, sodales erat et, convallis nisl.
Etiam ultrices vulputate purus, id tincidunt purus semper vel.
array[ 2]:
Ut mattis mi ac purus tempor bibendum.
Praesent sed metus enim.
Pellentesque at orci id mauris consectetur consequat.
Where each array index holds a complete block of text from the file, including the embedded and trailing \n'
characters.
CodePudding user response:
I have solved this using the following algorithm:
I create an array char *strings[MAX_STRINGS]
, in which every pointer is initialized to zero to indicate whether it points to a valid string or not. I read one line at a time using fgets
append that line to the current string. I use dynamic memory allocation (i.e. malloc
) to store and grow the actual strings, but the array strings
itself is fixed-length.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_STRINGS 200
#define MAX_LINE_LENGTH 300
int main( void )
{
char *strings[MAX_STRINGS] = { NULL };
int num_strings = 0;
char line[MAX_LINE_LENGTH];
//read one line of input per loop iteration
while ( fgets( line, sizeof line, stdin ) != NULL )
{
//make sure that line was not too long for input buffer
if ( strchr( line, '\n' ) == NULL )
{
size_t len;
//a missing newline character is not wrong
//if end-of-file has been reached
if ( !feof(stdin) )
{
fprintf( stderr, "Line too long for input buffer!\n" );
exit( EXIT_FAILURE );
}
//newline character is missing at end-of-file, so add it
len = strlen( line );
if ( len 1 == sizeof line )
{
fprintf( stderr, "No room for adding newline character!\n" );
exit( EXIT_FAILURE );
}
line[len] = '\n';
line[len 1] = '\0';
}
//determine whether line is empty
if ( strcmp( line, "\n" ) == 0 )
{
//determine whether current string already has content
if ( strings[num_strings] != NULL )
{
num_strings ;
}
//skip to next line
continue;
}
//make sure that maximum number of strings has not been exceeded
if ( num_strings == MAX_STRINGS )
{
fprintf( stderr, "Maximum number of strings exceeded!\n" );
exit( EXIT_FAILURE );
}
//determine whether current string already exists
if ( strings[num_strings] == NULL )
{
//allocate memory for new string
strings[num_strings] = malloc( strlen(line) 1 );
if ( strings[num_strings] == NULL )
{
fprintf( stderr, "Memory allocation failure!\n" );
exit( EXIT_FAILURE );
}
//copy string to allocated memory
strcpy( strings[num_strings], line );
}
else
{
size_t len;
//resize memory buffer for adding new string
len = strlen( strings[num_strings] );
len = strlen(line) 1;
strings[num_strings] = realloc( strings[num_strings], len );
if ( strings[num_strings] == NULL )
{
fprintf( stderr, "Memory allocation failure!\n" );
exit( EXIT_FAILURE );
}
//concatenate the current line with the existing string
strcat( strings[num_strings], line );
}
}
//mark last string as complete, if it exists
if ( strings[num_strings] != NULL )
{
num_strings ;
}
//print results
printf( "Found a total of %d strings.\n\n", num_strings );
for ( int i = 0; i < num_strings; i )
{
printf( "strings[%d] has the following content:\n%s\n", i, strings[i] );
//perform cleanup
free( strings[i] );
}
}
For the input posted in the question, this program has the following output:
Found a total of 2 strings.
strings[0] has the following content:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Donec commodo metus sit amet mauris facilisis, fringilla convallis erat dictum.
strings[1] has the following content:
Quisque scelerisque turpis hendrerit, sodales erat et, convallis nisl.
Etiam ultrices vulputate purus, id tincidunt purus semper vel.