Home > other >  Detect array of string in another string
Detect array of string in another string

Time:10-11

Im a beginner in programming and stuff, i want to solve this problem here

(Spam Scanner) Spam (or junk e-mail) costs U.S. organizations billions of dollars a year in spam-prevention software, equipment, network resources, bandwidth, and lost productivity Research online some of the most common spam e-mail messages and words, and check your own junk e-mail folder. Create a list of 30 words and phrases commonly found in spam messages. Write a program in which the user enters an e-mail message. Read the message into a large character array and ensure that the program does not attempt to insert characters past the end of the array. Then scan the message for each of the 30 keywords or phrases. For each occurrence of one of these within the message, add a point to the message’s “spam score.” Next, rate the likelihood that the message is spam, based on the number of points it received

I tried write my code like this

#include <stdio.h>
#include <string.h>
#include <ctype.h>

void find_string(char *emailSearch);
const char spam[][30] = {
"congratulation",
"free",
"100%",
"earn",
"million",
"click",
"here",
"instant",
"limited",
"urgent",
"winner",
"selected",
"bargain",
"deal",
"debt",
"lifetime",
"cheap",
"easy",
"bonus",
"credit",
"bullshit",
"scam",
"junk",
"spam",
"passwords",
"invest",
"bulk",
"exclusive",
"win",
"sign"};

int main(){
char email[1000];
    printf("Enter your short email message: \n");
    fgets(email, 80, stdin);
    email[strlen(email)-1] = '\0';
    find_string(email);
    return 0;
    }

void find_string(char *emailSearch){
int i = 0;
    while(emailSearch[i]){
        (tolower(emailSearch[i]));
        i  ;
    }
    if(strstr(emailSearch,spam)){
        printf("Your email message is considered spam!");
    }
    else{
        printf("Your email is not spam!");
    }
}

I tried inputing words in the spam array, but the output still printing "Your email is not spam!". Anyone can fix this?

CodePudding user response:

The main issue that you need to iterate over each of your spam words and search for that in your text. If you have strcasestr() use that instead of strtolower(email):

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define LEN 79

const char *spam[] = {
    "congratulation",
//  ...
};

char *strtolower(char *s) {
    size_t n = strlen(s);
    for(int i = 0; i < n; i  ) {
        s[i] = tolower(s[i]);
    }
    return s;
}

void find_string(char *emailSearch){
    for(int i = 0; i < sizeof spam / sizeof *spam; i  ) {
        if(strstr(emailSearch, spam[i])) {
            printf("Your email message is considered spam!\n");
            return;
        }
    }
    printf("Your email is not spam!\n");
}

int main(){
    char email[LEN 1];
    printf("Enter your short email message: \n");
    fgets(email, LEN 1, stdin);
    find_string(strtolower(email));
    return 0;
}

The next step would be to split your email into words so the spam word "here" will not cause the unrelated email word "there" to be treated as spam. You can now use strcmp() to compare the email and spam list of words. If you sort your spam list, you could use bsearch() instead of linear search. Alternatively consider using a hash table for your spam list.

The following step after that is implement some type of stemming so "congratulations" would again be considered spam because the root word "congratulation" is on the spam list.

CodePudding user response:

For my critique of the OP code, please refer to comments below the OP question.

As I pointed out in those comments, this is a weak spam detection scheme. strstr() is pretty indiscriminate, happy to match any sequence of characters if it can. Eg: it will find the word "town" in the word "boatowner". There'll be a lot of false postitives.

Anyway, since @Allan and I have such a good time at this, here's an adaptation of a search routine written for an SO question just a few hours ago (https://stackoverflow.com/a/74022127/17592432). You be the judge.

#include <stdio.h>
#include <string.h>

char *blacklist[] = {
    "congratulation",
    "free",     "100%",     "earn",     "million",  "click",    "here",
    "instant",  "limited",  "urgent",   "winner",   "selected", "bargain",
    "deal",     "debt",     "lifetime", "cheap",    "easy",     "bonus",
    "credit",   "bullshit", "scam",     "junk",     "spam",     "passwords",
    "invest",   "bulk",     "exclusive","win",      "sign"
};

int rate( char *str ) {
    for( char *p = str; *p; p   ) *p = (char)tolower( *p );

    int cnt = 0;
    for( int i = 0; i < sizeof blacklist/sizeof blacklist[0]; i   )
        for( char *p=str, *bl=blacklist[i]; (p = strstr(p, bl) ) != NULL; p  , cnt   )
            printf( "'%s' ", bl );

    return cnt;
}

int main() {
    char *emails[] = {
        "dear gramma,\n"
        "today i selected a puppy and a fish for my birthday\n"
        "how are you? are your investments showing signs of improving?\n"
        "and what' the deal with my instant gratification?\n"
        "i don't want to have to earn my million during my lifetime.\n"
        "i want yours. it's kinda urgent!\n"
        "love, your kid's kid\n",

        "Dear ex-Subscriber,\n"
        "We want you back as a valued customer.\n"
        "Changing your mind and renewing now, in response to this email,\n"
        "will allow us to alert you to more opportunities \n"
        "to purchase crap from us and our suppliers.\n"
        "Don't waste a moment. We can click again if you'll just click reply.\n"
    };

    for( int i = 0; i < sizeof emails/sizeof emails[0]; i   ) {
        puts( "-----------------------------------");
        puts( emails[i] );
        int rating = rate( emails[i] );
        printf( "\n***Rating %d - ", rating );

        if( rating > 8 )
            puts( "Email message is considered spam\n" );
        else
            puts("Email is not spam!\n");
    }

    return 0;
}
-----------------------------------
dear gramma,
today i selected a puppy and a fish for my birthday
how are you? are your investments showing signs of improving?
and what' the deal with my instant gratification?
i don't want to have to earn my million during my lifetime.
i want yours. it's kinda urgent!
love, your kid's kid

'earn' 'million' 'instant' 'urgent' 'selected' 'deal' 'lifetime' 'invest' 'win' 'sign'
***Rating 10 - Email message is considered spam

-----------------------------------
Dear ex-Subscriber,
We want you back as a valued customer.
Changing your mind and renewing now, in response to this email,
will allow us to alert you to more opportunities
to purchase crap from us and our suppliers.
Don't waste a moment. We can click again if you'll just click reply.

'click' 'click' 'win'
***Rating 3 - Email is not spam!

Those two blacklisted occurrences of win come from showing and renewing in the two messages... strstr() - the matching is not sophisticated. Improving this is left as an exercise for the reader.

  • Related