Home > Enterprise >  Comparing visually similar strings in C#
Comparing visually similar strings in C#

Time:12-03

New programmer here so sorry if this is an obvious question. I need to write a very simple bit of code in C# (for an exercise, so no database to connect to) to compare two visually similar strings e.g. FOX and F0X.

From what I can tell, most comparison methods would come back saying that they aren't similar because O and 0 are different characters, so I'm at a bit of a loss as to how to go about this!

Any pointers would be much appreciated! Thanks

CodePudding user response:

This is not a trivial task, and there is no single predefined way of doing this.

But thinking about how you might approach it, you'll probably need to start by building a collection of rules about which characters you consider to look similar. And then look up those rules when comparing characters.

Something like this.

string[] SimilarCharacters = new string[]
{
    "O0",
    "I1"
    // Etc...
};

void Main()
{
    Console.WriteLine(AreSimilar("FOX", "F0X"));
    Console.WriteLine(AreSimilar("BOX", "B0X"));
    Console.WriteLine(AreSimilar("FIG", "F1G"));
    Console.WriteLine(AreSimilar("J1G", "JIG"));
}

bool AreSimilar(string s1, string s2)
{
    // No match if different lengths
    if (s1.Length != s2.Length)
        return false;

    for (int i = 0; i < s1.Length; i  )
    {
        if (s1[i] != s2[i])
        {
            string similar = FindSimilar(s1[i]);
            if (similar == null)
                return false;
                
            if (!similar.Contains(s2[i]))
                return false;
        }
    }
    return true;
}

string FindSimilar(char c)
{
    for (int i = 0; i < SimilarCharacters.Length; i  )
    {
        if (SimilarCharacters[i].Contains(c))
            return SimilarCharacters[i];
    }
    return null;
}

CodePudding user response:

I think an appropriate approach would be to substitute all similar characters (or sets of characters) with just one of them. i.e.: replace all 'O's and '0's with 'O' (or vice-versa). That way, you just have one character to focus on. Note that you would have to account for uppercase and lowercase, as I assume you wouldn't want an 'o' be replaced with an 'O' (or maybe you do.) Either way, I think this is a good method.

        using System.Text;
        
        (...)

        string a = "0()P is e><ce1|ent"; //OOP, Object Oriented Programming
        StringBuilder b = new StringBuilder();

        bool parenthesesZero = false;
        bool greaterThanX = false;

        for (int i = 0; i < a.Length; i  )
        {
            switch (a[i])
            {
                case '0':
                    b.Append('O');
                    break;
                case '(':
                    b.Append('(');
                    parenthesesZero = true;
                    break;
                case ')':
                    if (parenthesesZero)
                    {
                        b[i - 1] = 'O';
                        parenthesesZero = false;
                    }
                    else
                        b.Append(')');
                    break;
                case '>':
                    b.Append('>');
                    greaterThanX = true;
                    break;
                case '<':
                    if (greaterThanX)
                    {
                        b[b.Length - 1] = 'X';
                        greaterThanX = false;
                    }
                    else
                        b.Append('<');
                    break;
                case '|':
                case 'I':
                case '1':
                    b.Append('l');
                    break;
                default:
                    b.Append(a[i]);
                    break;
            }
        }
            

        Console.WriteLine(a   "\n"   b);

Notice how I'm using fallthrough to lump a group of similar characters into one, and how I'm keeping track of more-than-one-character sequences that may be replaced by one (or various) characters. If you have more than two characters, maybe consider using an integer counter instead.

It may not be the best answer, but I hope you consider it useful.

Good day!

  • Related