I am trying to read the below string which is an OCR extracted text.
ex-1. " My name is jack sparrow:”
Here when I search for the string "name is jack"
the output what i am expecting is "name{99.03} is{85.37} jack{95.42}"
from the below string (with the first bracket values on the right side of the string extracted)
ex-2. " My{99.64} name{99.03} is{85.37} jack{95.42} sparrow:{99.26}"
I am using the below code to get "name is jack"
which is working perfectly but cannot get the above expected output because of the brackets in ex-2 string.
int pFrom = result.ToLower().IndexOf(startWord) startWord.Length;
int pTo = result.ToLower().IndexOf(endWord, pFrom);
result = result.Substring(pFrom, pTo - pFrom).Trim();
Any help would be really helpful. Thanks in advance.
CodePudding user response:
Let's say you found your search string in ex1 => Index 4..15
1 0123456789012345 ex1: My name is jack sparrow: ^^^^^^^^^^^^ ex2: My{99.64} name{99.03} is{85.37} jack{95.42} sparrow:{99.26}
Then we tokenize:
ex1 [" My","name","is","jack","sparrow:"]
ex2 [" My{99.64}","name{99.03}","is{85.37}","jack{95.42}","sparrow:{99.26}"]
Then we can find indices:
" My" - length = 3 , startIndex was 4 -> nope, we add a space, so we are at (string-)index 4 which is equal to start So, arrayindex 1 is in => [1] "name" - length = 4 , 4 4 1 (whitespace) = 9 9 is < 15 => array-index 2 is still in => [1,2] "is" - length = 2 , 9 2 1(whitespace) = 12 < 15 => [1,2,3] "jack" - length = 4, 12 4 = 16 > 15 => DONE! Result = [1,2,3]
So, now we know the search result consist of index 1 to 3 of ex1. That means, our desired result consists of index 1 to 3 of ex2.
Then we can concatenate:
ex2[1] " " ex2[2] " " ex2[3] = "name{99.03}" " " "is{85.37}" " " "jack{95.42}" = "name{99.03} is{85.37} jack{95.42}" BOOM!
Should also work for the case mentioned in comment.