Home > OS >  c# Remove overlapping parts of two strings
c# Remove overlapping parts of two strings

Time:12-12

I have strings s1 and s2 of size 100000.

The front part of s2 is the same as [0~n] to [s1.Lenght - 1] of s1.

I want to find the overlapping part of s1 in s2 and remove it.

example 1)

str1 = "abcdefgh"
str2 = "cdefghijklmno";

i want: ijklmno

example 2)

str1 = "abcdefgh"
str2 = "efghijklmno";

i want: ijklmno

example 2)

str1 = "aaabbb"
str2 = "abbbfffxx"

i want: fffxx

The length of the string is always over 100000 and it takes too long to compare them one by one because it has to be done in real time.

Do you have any good solution?

CodePudding user response:

So the substring is always at the beginning? Then this works and is efficient:

public static string RemovePartFromStart(string full, string part, out string prefixFound)
{
    for (int i = 0; i < part.Length; i  )
    {
        string part2 = part.Substring(i);
        if (full.StartsWith(part2))
        {
            prefixFound = part2;
            return full.Substring(part2.Length);
        }
    }

    prefixFound = null;
    return full;
}

Your samples:

var strings = new List<(string part, string full)> { ("abcdefgh", "cdefghijklmno"), ("abcdefgh", "efghijklmno"), ("aaabbb", "abbbfffxx") };
foreach (var x in strings)
{
    Console.WriteLine(RemovePartFromStart(x.full, x.part, out string prefixFound));
}

output:

ijklmno
ijklmno
fffxx

.NET Fiddle with 100k-length strings: https://dotnetfiddle.net/khomWB (takes ~0.1sec. for all)

CodePudding user response:

Please let me know if the following code answers your question

using System;
using System.Text.RegularExpressions;

namespace Test
{
    class Program
    {
        static void Main()
        {
            string s1 = "cdef";
            string s2 = "abcdefghijklmnopqrstuvwxyz";
            Regex regex = new Regex($"(?:{s1}|[{s1}] )(.*)");
            string result = regex.Match(s2).Groups[1].Value;
            Console.WriteLine(result);
        }

    }
}


Fabio

CodePudding user response:

 public static string RemovePartFromStart(string s1, string s2)
 {
        char lastPart = s1.Last();

        int index = Math.Min(s2.Length - 1, s1.Length - 1);
        while(index > 0)
        {
            if (s2[index] == lastPart)
            {
                if (s2[0] == s1[s1.Length - 1 - index])
                    return s2.Substring(index   1);
            }
            --index;
        }

        return "";
 }

CodePudding user response:

My take on it using a regex:

using System.Text.RegularExpressions;
//...
static string NonOverlap(string s1, string s2)
{
    var re = string.Join("?", s1.ToCharArray())   "?";
    var m = Regex.Match(s2, re);
    int s = m.Index   m.Length;
    return s2.Substring(s);
}

static void Main(string[] args)
{
    string str1 = "abcdefgh";
    string str2 = "cdefghijklmno";

    Console.WriteLine(NonOverlap(str1, str2));
    str1 = "abcdefgh";
    str2 = "efghijklmno";

    Console.WriteLine(NonOverlap(str1, str2));
    str1 = "aaabbb";
    str2 = "abbbfffxx";
    Console.WriteLine(NonOverlap(str1, str2));

    Console.ReadLine();

}

Outputs:

ijklmno
ijklmno
fffxx

It turns the prefix string, say "abc", into a regex like "a?b?c?" which means each letter is optional so it copes with the case where the overlap starts some way into str1.

  •  Tags:  
  • c#
  • Related