Home > other >  Easiest way to get the substring by eliminating the unwanted string in C#
Easiest way to get the substring by eliminating the unwanted string in C#

Time:12-10

I am trying to read the below string which is an OCR extracted text.

ex-1. " My name is jack sparrow:”

Here when I search for the string "name is jack" the output what i am expecting is "name{99.03} is{85.37} jack{95.42}" from the below string (with the first bracket values on the right side of the string extracted)

ex-2. " My{99.64} name{99.03} is{85.37} jack{95.42} sparrow:{99.26}"

I am using the below code to get "name is jack" which is working perfectly but cannot get the above expected output because of the brackets in ex-2 string.

                int pFrom = result.ToLower().IndexOf(startWord)   startWord.Length;
                int pTo = result.ToLower().IndexOf(endWord, pFrom);
                result = result.Substring(pFrom, pTo - pFrom).Trim();

Any help would be really helpful. Thanks in advance.

CodePudding user response:

Let's say you found your search string in ex1 => Index 4..15

              1
    0123456789012345
ex1: My name is jack sparrow:
        ^^^^^^^^^^^^
ex2: My{99.64} name{99.03} is{85.37} jack{95.42} sparrow:{99.26}

Then we tokenize:

ex1 [" My","name","is","jack","sparrow:"]
ex2 [" My{99.64}","name{99.03}","is{85.37}","jack{95.42}","sparrow:{99.26}"]

Then we can find indices:

" My" - length = 3 , startIndex was 4 -> nope, 
we add a space, so we are at (string-)index 4 which is equal to start
So, arrayindex 1 is in => [1]

"name" - length = 4 , 4   4   1 (whitespace) = 9
9 is < 15 => array-index 2 is still in => [1,2]

"is" - length = 2 , 9 2 1(whitespace) = 12 < 15 => [1,2,3]

"jack" - length = 4, 12   4 = 16 > 15 => DONE! Result = [1,2,3]

So, now we know the search result consist of index 1 to 3 of ex1. That means, our desired result consists of index 1 to 3 of ex2.

Then we can concatenate:

ex2[1]   " "   ex2[2]   " "   ex2[3] =
"name{99.03}"   " "   "is{85.37}"   " "   "jack{95.42}" =

"name{99.03} is{85.37} jack{95.42}"

BOOM!

Should also work for the case mentioned in comment.

  • Related