this is my first time asking questions here. I'm currently learning C and Linux at the same time. I'm working on a simple c program that use system call only to read and write files. My problem now is, how can I read the file and compare the string/word are the same or not. An example here like this:
foo.txt contains:
hi
bye
bye
hi
hi
And bar.txt is empty.
After I do:
./myuniq foo.txt bar.txt
The result in bar.txt will be like:
hi
bye
hi
The result will just be like when we use uniq in Linux.
Here is my code:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#define LINE_MAX 256
int main(int argc, char * argv[]){
int wfd,rfd;
size_t n;
char temp[LINE_MAX];
char buf[LINE_MAX];
char buf2[LINE_MAX];
char *ptr=buf;
if(argc!=3){
printf("Invalid useage: ./excutableFileName readFromThisFile writeToThisFile\n");
return -1;
}
rfd=open(argv[1], O_RDONLY);
if(rfd==-1){
printf("Unable to read the file\n");
return -1;
}
wfd=open(argv[2], O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
if(wfd==-1){
printf("Unable to write to the file\n");
return -1;
}
while(n = read(rfd,buf,LINE_MAX)){
write(wfd,buf,n);
}
close(rfd);
close(wfd);
return 0;
}
The code above will do the reading and writing with no issue. But I can't really figure out how to read char one by one in C style string under what condition of while loop.
I do know that I may need a pointer to travel inside of buf to find the next line '\n' and something like:
while(condi){
if(*ptr == '\n'){
strcpy(temp, buf);
strcpy(buf, buf2);
strcpy(buf2, temp);
}
else
write(wfd,buf,n);
*ptr ;
}
But I might be wrong since I can't get it to work. Any feedback might help. Thank you.
And again, it only can be use system call to accomplish this program. I do know there is a easier way to use FILE and fgets or something else to get this done. But that's not the case.
CodePudding user response:
You only need one buffer that stores whatever the previous line contained.
The way this works for the current line is that before you add a character you test whether what you're adding is the same as what's already in there. If it's different, then the current line is marked as unique. When you reach the end of the line, you then know whether to output the buffer or not.
Implementing the above idea using standard input for simplicity (but it doesn't really matter how you read your characters):
int len = 0;
int dup = 0;
for (int c; (c = fgetc(stdin)) != EOF; )
{
// Check for duplicate and store
if (dup && buf[len] != c)
dup = 0;
buf[len ] = c;
// Handle end of line
if (c == '\n')
{
if (dup) printf("%s", buf);
len = 0;
dup = 1;
}
}
See here that we use the dup
flag to represent whether a line is duplicated. For the first line, clearly it is not, and all subsequent lines start off with the assumption they are duplicates. Then the only possibility is to remain a duplicate or be detected as unique when one character is different.
The comparison before store is actually avoiding tests against uninitialized buffer values too, by way of short-circuit evaluation. That's all managed by the dup
flag -- you only test if you know the buffer is already good up to this point:
if (dup && buf[len] != c)
dup = 0;
That's basically all you need. Now, you should definitely add some sanity to prevent buffer overflow. Or you may wish to use a dynamic buffer that grows as necessary.
An entire program that operates on standard I/O streams, plus handles arbitrary-length lines might look like this:
#include <stdio.h>
#include <stdlib.h>
int main()
{
size_t capacity = 15, len = 0;
char *buf = malloc(capacity);
for (int c, dup = 0; (c = fgetc(stdin)) != EOF || len > 0; )
{
// Grow buffer
if (len == capacity)
{
capacity = (capacity * 2) 1;
char *newbuf = realloc(buf, capacity);
if (!newbuf) break;
buf = newbuf;
dup = 0;
}
// NUL-terminate end of line, update duplicate-flag and store
if (c == '\n' || c == EOF)
c = '\0';
if (dup && buf[len] != c)
dup = 0;
buf[len ] = c;
// Output line if not a duplicate, and reset
if (!c)
{
if (!dup)
printf("%s\n", buf);
len = 0;
dup = 1;
}
}
free(buf);
}
Demo here: https://godbolt.org/z/GzGz3nxMK
CodePudding user response:
If you must use the read
and write
system calls, you will have to build an abstraction around them, as they have no notion of lines, words, or characters. Semantically, they deal purely with bytes.
Reading arbitrarily-sized chunks of the file would require us to sift through looking for line breaks. This would mean tokenizing the data in our buffer, as you have somewhat shown. A problem occurs when our buffer ends with a partial line. We would need to make adjustments so our next read
call concatenates the rest of the line.
To keep things simple, instead, we might consider reading the file one byte at a time.
A decent (if naive) way to begin is by essentially reimplementing the rough functionally of fgets
. Here we read a single byte at a time into our buffer, at the current offset. We end when we find a newline character, or when we would no longer have enough room in the buffer for the null-terminating character.
Unlike fgets
, here we return the length of our string.
size_t read_a_line(char *buf, size_t bufsize, int fd)
{
size_t offset = 0;
while (offset < (bufsize - 1) && read(fd, buf offset, 1) > 0)
if (buf[offset ] == '\n')
break;
buf[offset] = '\0';
return offset;
}
To mimic uniq
, we can create two buffers, as you have, but initialize their contents to empty strings. We take two additional pointers to manipulate later.
char buf[LINE_MAX] = { 0 };
char buf2[LINE_MAX] = { 0 };
char *flip = buf;
char *flop = buf2;
After opening our files for reading and writing, our loop begins. We continue this loop as long as we read a nonzero-length string.
If our current string does not match our previously read string, we write it to our output file. Afterwards, we swap our pointers. On the next iteration, from the perspective of our pointers, the secondary buffer now contains the previous line, and the primary buffer is overwritten with the current line.
Again, note that our initial previously read line is the empty string.
size_t length;
while ((length = read_a_line(flip, LINE_MAX, rfd))) {
if (0 != strcmp(flip, flop))
write(wfd, flip, length);
swap_two_pointers(&flip, &flop);
}
Our pointer swapping function.
void swap_two_pointers(char **a, char **b) {
char *t = *a;
*a = *b;
*b = t;
}
Some notes:
- The contents of our file-to-be-read should never contains a line that would exceed
LINE_MAX
(including the newline character). We do not handle this situation, which is admittedly a sidestep, as this is the problem we wanted to avoid with the chunking method. read_a_line
should not be passedNULL
or0
, to its first and second arguments. An exercise for the reader to figure out why that is.read_a_line
does not really handleread
failing in the middle of a line.