Removing occurrences of "The" and "the"-CodePudding

I was working on a assignment that takes input:

The Dhillon Theatre is now Fun Republic

and outputs:

Dhillon atre is now Fun Republic.

My code somehow kinda works but literally removes every occurrences of 't, h, and e'. and not 'the' as a whole.

What should I do? How can I make the program read 'the' as one word and not as separate chars?

Source Code:

#include <stdio.h>
#include <string.h>

//The Dhillon Theatre is now Fun Republic
#define SIZE 100

int func();

int main()
{
    func();
}

int func()
{
    int i, j, inputlen, removelen;
    char input[SIZE];
    char toRemove[4] = "the";

    printf("Enter the string: \n");
    gets(input);

    inputlen = strlen(input);
    removelen = strlen(toRemove);

    for (i = 0; i<inputlen; i  )
    {
        for (j = 0; j<removelen; j  )
        {
            if (input[i] == toRemove[j])
            {
                for (j = i; j<inputlen; j  )
                {
                    input[j] =  input[j   1];
                }
                inputlen--;
                i--;
            }
        }
    }
    printf("%s", input);
}

CodePudding user response：

How can I make the program read 'the' as one word and not as separate chars?

Code fails because even though the title is Removing occurrences of "The" and "the", goal is to remove "The/the" based on a complete word.

What is a word? Look at beginning and end for non-letter.

"The", "The dog", "What the", "Where is the beef"

But not

"", "There", "These dogs", "What lathe", "Where is other beef"

// inputlen = strlen(input);
removelen = strlen(toRemove);

const char *read = input;
const char *write = input;
char previous_non_letter = true;
while (*read) {
  // was the previous char a non-letter?
  if (!isalpha(previous_non_letter)) {
    // match toRemove?  
    int i;
    for (i = 0; i<removelen; i  ) {
      if (tolower(read[i]) != tolower(toRemove[i])) {
        break; 
      }
    }
    // Complete match and next is a non-character?
    if (i == removelen && !isalpha(read[i]) {
      // Advance reading by removelen
      read  = removelen;
      previous_non_letter = false;
      continue;
    }
  }
  previous_non_letter = !isalpha(read);
  *write   = *read  ;
}
*write = '\0';

Above will take "the" out (as the titlte stated), but not remove a leading nor tailing non-letter (e.g. ' '). Leave that addition for OP if desired.

Assumptions: removelen > 0 and last character of toRemeve is a letter. OK to remove "thE".

Unclear as to OP's goal when non-letter, non-space involved like "the123 123the456 123the".

CodePudding user response：

One useful function will be the ability to downcase a string. We'll set this aside for now.

void downcase(char *str) {
    for (char *ch = str; *ch; ch  ) {
        if (*ch >= 'A' && *ch <= 'Z') {
            *ch = *ch   32;
        }
    }
}

Another useful function. We want to determine if a substring is a word, so we need to test the characters before and after it. Obviously, this could be more sophisticated in terms of the characters it looks for, but for educational purposes this will do.

int is_word(char *src, size_t s, size_t len) {
    return ((s == 0 || src[s - 1] == ' ' || src[s - 1] == '.' || src[s - 1] == ',') &&
            (s   len >= strlen(src) || src[s   len] == ' ' || src[s   len] == '.' || src[s   len] == ',' || src[s   len] == '\0'));
}

Now, let's start a remove_word function. It'll iterate over the input string, looking at substrings the length of the word we want to replace. It'll use the is_word function to print a note if the word in question is a word.

This is of course, just one solution to this problem.

void remove_word(char *input, char *to_remove, char* dest) {
    int index;

    size_t input_len = strlen(input);
    size_t to_remove_len = strlen(to_remove);

    char temp[to_remove_len   1];
    char temp_ci[to_remove_len   1];

    downcase(to_remove);

    for (index = 0; index < input_len - to_remove_len   1; index  ) {
        strncpy(temp, input   index, to_remove_len);
        strncpy(temp_ci, input   index, to_remove_len);
        temp[to_remove_len] = '\0';
        temp_ci[to_remove_len] = '\0';

        downcase(temp_ci);

        printf("%s", temp);

        if (is_word(input, index, to_remove_len)) {
            printf(" -> Word\n");
        }
        else {
            printf("\n");
        }
    }
}

Now, if we test this on your test string:

int main() {
    char *replace = "the";

    char dest[100];

    remove_word("The Dhillon Theatre is now Fun Republic", replace, dest);
}

We get this output:

The -> Word
he
e D
 Dh
Dhi
hil
ill
llo
lon
on
n T
 Th
The
hea
eat
atr
tre
re
e i
 is
is
s n
 no
now -> Word
ow
w F
 Fu
Fun -> Word
un
n R
 Re
Rep
epu
pub
ubl
bli
lic

This is super important to accomplishing your goal. We can now identify words that are the right length to be the word we're looking for.

The word to remove has been downcased, and we have a downcased version of the current substring. It's pretty straightforward to find out if the current word should be removed.

void remove_word(char *input, char *to_remove, char* dest) {
    int index;

    size_t input_len = strlen(input);
    size_t to_remove_len = strlen(to_remove);

    char temp[to_remove_len   1];
    char temp_ci[to_remove_len   1];

    downcase(to_remove);

    for (index = 0; index < input_len - to_remove_len   1; index  ) {
        strncpy(temp, input   index, to_remove_len);
        strncpy(temp_ci, input   index, to_remove_len);
        temp[to_remove_len] = '\0';
        temp_ci[to_remove_len] = '\0';

        downcase(temp_ci);

        if (is_word(input, index, to_remove_len)) {
            printf("=: %s", index, temp);


            if (strcmp(to_remove, temp_ci) == 0) {
                printf(" -> Bingo!\n");
            }
            else {
                printf(" -> Word\n");
            }
        }
    }
}

Now when we run it:

  0: The -> Bingo!
 23: now -> Word
 27: Fun -> Word

We now know how to find instances of the word to remove, and the index where they start.

Now we simply have to implement the copying (or not copying) into dest based on this information. For this I've added a write_index that will keep track of where to insert characters into dest.

void remove_word(char *input, char *to_remove, char* dest) {
    int index, write_index;

    size_t input_len = strlen(input);
    size_t to_remove_len = strlen(to_remove);

    char temp[to_remove_len   1];
    char temp_ci[to_remove_len   1];

    downcase(to_remove);

    for (index = 0, write_index = -1; index < input_len - to_remove_len   1; index  , write_index  ) {
        strncpy(temp, input   index, to_remove_len);
        strncpy(temp_ci, input   index, to_remove_len);
        temp[to_remove_len] = '\0';
        temp_ci[to_remove_len] = '\0';

        downcase(temp_ci);

        if (is_word(input, index, to_remove_len) && strcmp(to_remove, temp_ci) == 0) {
            index  = to_remove_len - 1;
        }
        else if (index   to_remove_len == input_len) {
            for (int i = 0; i < to_remove_len; i  ) {
                *(dest   write_index  ) = input[index  ];
            }
        }
        else {
            *(dest   write_index) = input[index];
       }
    }

    *(dest   write_index) = '\0';
}

Now running your test:

int main() {
    char *replace = "the";
    char dest[100] = {0};

    remove_word("The Dhillon Theatre is now Fun Republic", replace, dest);

    printf("%s\n", dest);
}

We get:

$ ./a.out
 Dhillon Theatre is now Fun Republic
$