substitution of one letter for another cipher key-CodePudding

So I am working on a SC50 problem where I need to make a simple cipher and be able to encrypt words or sentences... After 2 full days I actually kindof figured it all out, but my code was really long and after some googling I found a version out there that was much better. And it was real easy and all, except for the part where there is stuff that I don't really understand how it works, and I would really like to find out how... so here is the full code below (unfortunately I can't seem to find the original source of the code right now, but I actually did at least half of it myself, and only the part after "//SUBSTITUTION is copied) :

and also, what I wonder about, are these two rows:

printf("%c", toupper(arg[plaintext[i] - 65]));  //calculation to print the encipher text amd make sure it is Uppercase (case doesn't change)

and

printf("%c", tolower(argv[1][plaintext[i] - 97]));  ///calculation to print the encipher text amd make sure it is lowercase(case doesn't change)

...I can't wrap my head around, how the calculation "-65" and "-66" are solving the issue... Lets say that in my key, the first letter is a Q, and when I write and A, it should be substituted for a Q...

A = 65 and Q = 81 on the Ascii table, so when I take 65 - 65... mm why would I do that? obviously it needs to be done for this program to work correctly, but I don't understand how it works and what actually happens...

what is the logic behind these calculations? please help!

#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>

int main(int argc, string argv[]) {
    if (argc != 2) {
        printf("Usage: ./substitution key\n");
        return 1;
    }

    string arg = argv[1];
    int chars = 0;

    for (int i = 0; i < strlen(arg); i  ) {
        if (isalpha(arg[i])) {
            for (int j = i 1; j < strlen(arg); j  ) {
                if (toupper(arg[j]) == toupper(arg[i])) {
                    printf("Key must not contain repeated alphabets.\n");
                    return 1;
                }
            }

            chars  = 1;
        }
    }

    if (chars != 26) {
        printf("Key must contain 26 characters.\n");
        return 1;
    }

    // SUBSTITUTION
    printf("%s\n", arg);
    string plaintext = get_string("plaintext: ");   //Getting user's input as plaintext
    printf("ciphertext: ");                         //to print the ciphertext
    int plaintext_length = strlen(plaintext);       //get the strlen of plaintext (user's input)

    for (int i = 0; i < plaintext_length; i  ) {      //iterate over the plaintext_Length
        if (isupper(plaintext[i])) {                  //check if plaintext character is uppercase
            printf("%c", toupper(arg[plaintext[i] - 65]));  //calculation to print the encipher text amd make sure it is Uppercase (case doesn't change)
        }
        else if (islower(plaintext[i])) {  //check if plaintext character is lowercase
            printf("%c", tolower(arg[plaintext[i] - 97]));  ///calculation to print the encipher text amd make sure it is lowercase(case doesn't change)
        }
        else {   //if plaintext is anything else, print it like that without changing
            printf("%c", plaintext[i]);
        }
    }

    printf("\n");  //print new line    
}

CodePudding user response：

In general it is a bad code.

For example instead of using magic numbers 65 or 97

printf("%c", toupper(arg[plaintext[i] - 65]));

printf("%c", tolower(argv[1][plaintext[i] - 97]));

it is better to write

printf("%c", toupper(arg[plaintext[i] - 'A']));

printf("%c", tolower(argv[1][plaintext[i] - 'a']));

argv[1] or arg contains a string of 26 letters as for example

"xyzabcgtidefuvwjklmn0rspqh"

If you have a string as for example "Hello" then 'H' - 'A' gives the value 7. Using the number you can find at position 7 in the array pointed by teh string argv[1] or arg the letter t

"xyzabcgtidefuvwjklmn0rspqh"
        ^
        |
       'H'

So the letter 'H' in the source string is coded like the letter 'T'. For the second letter 'e' you have 'e' - 'a' is equal to 4. So you have

"xyzabcgtidefuvwjklmn0rspqh"
     ^
     |
    'e'

So the first two letter of the string "Hello" becomes "Tbllo". This approach is used for the remaining letters of the source string to encrypt it.

CodePudding user response：

"... I can't wrap my head around,..."

Yes, it is difficult to see what is going on in code that avoids (appropriately) using a temporary copy of the data of interest.

Without the comments, this is the "confusing" portion of your code, rewritten to temporarily use a single character variable:

// SUBSTITUTION
printf("%s\n", arg);
string plaintext = get_string("plaintext: ");   //Getting user's input as plaintext
printf("ciphertext: ");                         //to print the ciphertext
int plaintext_length = strlen(plaintext);       //get the strlen of plaintext (user's input)
for (int i = 0; i < plaintext_length; i  )      //iterate over the plaintext_Length
{
    char c = plaintext[i];
    if( isupper( c ) )
    {
        printf("%c", toupper(arg[c - 65]));
    }
    else if ( islower( c ) )
    {
        printf("%c", tolower(arg[c - 97]));
    }
    else
    {
        printf("%c", plaintext[i]);
    }
}

It is now obvious that there will be an output character, so 3 calls to printf() are distracting. Simplifying that leads to 're-using' the temporary char variable. (Here just showing the for() loop):

for (int i = 0; i < plaintext_length; i  )      //iterate over the plaintext_Length
{
    char c = plaintext[i];
    if( isupper( c ) )
    {
        c = toupper(arg[c - 65]);
    }
    else if ( islower( c ) )
    {
        c = tolower(arg[c - 97]);
    }
    else
    {
        c = plaintext[i];
    }
    printf( "%c", c );
}

It is now apparent that the final 'else' is redundant:

for (int i = 0; i < plaintext_length; i  )      //iterate over the plaintext_Length
{
    char c = plaintext[i];
    if( isupper( c ) )
    {
        c = toupper(arg[c - 65]);
    }
    else if ( islower( c ) )
    {
        c = tolower(arg[c - 97]);
    }
    printf( "%c", c );
}

The values '65' and '97' are called "magic numbers" (that you already understand correspond to ASCII 'A' and 'a' respectively.) Cleaning up that bad practice.

if( isupper( c ) )
{
    c = toupper( arg[ c - 'A' ] );
}
else if ( islower( c ) )
{
    c = tolower( arg[ c - 'a' ] );
}
printf( "%c", c );

It now is readily apparent that the 'case' of each input character determines the 'case' of the corresponding output character.

It should now also be readily apparent that the difference 'offset' from 'A' or 'a' of the plaintext character ( 'A/a' = 0, 'B/b' = 1, 'C/c' = 2) is being calculated. The result of that calculation becomes the INDEX of the 26 character enciphering key. Your 'Q' becomes '16' so the 16th character of the key is "looked up", turned into the appropriate case, and then used.

This operation can be further reduced as per the following:

for (int i = 0; i < plaintext_length; i  )      //iterate over the plaintext_Length
{
    char c = plaintext[i]; // copy of plaintext character
    if( isalpha( c ) ) { // translate only alphabetic chars
        c = tolower( c ) - 'a'; // 'a-z' ==> '0-25'
        c = arg[ c ]; // use as index into key.

        if( isupper( plaintext[i] ) // make case of enciphered char match input
            c = toupper( c );;
    }
    printf( "%c", c );
}

Or, even more:

for (int i = 0; i < plaintext_length; i  )      //iterate over the plaintext_Length
{
    char c = plaintext[i]; // copy of plaintext character
    if( isalpha( c ) ) { // translate only alphabetic chars
        c = arg[ tolower( c ) - 'a' ]; // select corresponding 'key' character

        if( isupper( plaintext[i] ) // make case of enciphered char match input
            c = toupper( c );;
    }
    printf( "%c", c );
}

Although that seems intricate, its brevity is its strength.

EDIT: isalpha() toupper() and tolower() are standard C functions. The code will need to: #include <ctype.h> to use those functions.

EDIT2: toupper() and tolower() will return an unsigned char. To compile without warnings, change the declaration of 'c':

unsigned char c = plaintext[i]; // copy of plaintext character

EDIT3: Your OP did not ask about the "validation code" that you say you wrote. I'm sorry, but it is insufficient. While it confirms there are 26 distinct characters in the key, a user could type a key containing additional punctuation sprinkled in. "abc" contains 3 distinct letters, but so does "a.b:!c"... You test for isalpha(). Why not halt immediately if a non-alpha char is found in the key? As written, illegitimate keys may be used and the 'enciphering' very, very incorrect...