Home > Enterprise >  Comparing two string using XOR return true but the strings are different
Comparing two string using XOR return true but the strings are different

Time:02-18

I'm testing some ways to identify anagrams and I found a situation that got me off guard. I found out that it's possible to do using XOR so I was testing it using the XOR operator. Here's my code:

public static void main(String[] args) {
    // TODO code application logic here
    
    String s1 = "pe";
    String s2 = "ep";
    
    System.out.println(isAnagram(s1, s2));
}

private static boolean isAnagram(String firstString, String secondString) 
{
    int control = 0;
    
    System.out.println("Comparing: "   firstString   " and "   secondString);
    
    for (int i = 0; i < firstString.length(); i  ) {
        control = control ^ firstString.charAt(i);
    }
    
    for (int i = 0; i < secondString.length(); i  ) {
        control = control ^ secondString.charAt(i);
    }
    
    System.out.println("Control: "   control);
    return (control == 0);
}

When the 2 strings have the same sets of characters, even when they are not in the same order the control variable is 0 returning true to anagram. However, when the 2 strings are different the control has a value > 0 returning false to anagram. I tried using many words and most of them worked, but for some reason it often has some strange situations where, for example, "v" and "ils" return true to anagram or "tat" and "atata" returns true as well.

I would like to understand why it happens and what should I do so this situations doesn't show up anymore.

CodePudding user response:

Simply, the algorithm you are using is not going to work. Since XOR is associative and commutative (like, for example, addition), XORing together all the characters in a string produces the same value regardless of the order in which you do the XORs. Similarly, you get the same sum of the values in an array regardless of the order in which you do the additions.

But, also like addition, XOR throws away information. You cannot go backwards from the result to the original values: 1 3 = 2 2 = 0 4. And similarly with XOR: 1^3 = 6^4 = 0^2.

One particular feature of XOR is that a ^ a = 0 for any a; also a ^ 0 = a. (These statements are related.) So you can always just remove pairs of identical characters; the XOR combination of atata is the same as the combination of tat and also the same as a.

CodePudding user response:

So you're going to continue running into these issues due to how the function of bitwise operators. v has an acsii of 01110110,i has an acsii of 01101001, l has an acsii of 01101100, and s has an acsii of 01110011.

Here's the line by line of the comparisons that lead to 00000000 being returned.

v - 01110110
i - 01101001
new:00011111
l - 01101100
new:01110011
s - 01110011
new:00000000

Each "new" is your control and the individual comparisons that lead to 00000000 or your true result.

CodePudding user response:

This can be solved with one loop:

private static boolean isAnagram(String firstString, String secondString) 
{
    int control = 0;
        
    System.out.println("Comparing: "   firstString   " and "   secondString);

    for(int i = 0; i < firstString.length(); i  ) {
        control ^= firstString.charAt(i) ^ secondString.charAt(i);
    }
        
    System.out.println("Control: "   control);
    return (control == 0);
}

Algorithm:

Let x be the array of the first string's characters.
Let y be the array of the second string's characters.
Let c be the control.

The algorithm:

enter image description here

Example:

Let x = (a b c), y = (c b a), and c = 0.

At i = 0: c = 00000000 ^ (01100001 ^ 01100011) = 00000000 ^ 00000010 = 00000010

At i = 1: c = 00000010 ^ (01100010 ^ 01100010) = 00000010 ^ 00000000 = 00000010

At i = 2: c = 00000010 ^ (01100011 ^ 01100001) = 00000010 ^ 00000010 = 00000000

  • Related