Home > Mobile >  Extract list of substring from string and remove from string
Extract list of substring from string and remove from string

Time:08-04

I have a string and must extract all the substrings between 2 different words ('alpha' and 'beta'). I must return a json with two fields.

I tried in this way but it doesn't work correctly:

            string content = "string working on";
            var listSubString = new List<string>();
            int index = 0;
            do
            {
                index = content.LastIndexOf("alpha");
                if (index != -1)
                {
                    var length = content.IndexOf("beta");
                    string substring= content.Substring(index, length);
                    content = content.Replace(substring, string.Empty);
                    listSubString.Add(substring.Replace("alpha", string.Empty).Replace("beta", string.Empty));
                }
            } while (index != -1);
Content = content;
ListSubString = listSubString;

I'd like with a string like "hello alpha I don't want this part 1 beta world alpha i don't want this part 2 beta have a nice day" receive a json like {Content : "hello world have a nice day, ListSubString : ["i don't want this part 1", "i don't want this part 2"]}

Thanks for the help

CodePudding user response:

I got the output you want, Hope it solves your purpose

Link : https://dotnetfiddle.net/b8T8qy

This code part i changed in your code

string substring= content.Substring(index, length-1);
listSubString.Insert(0, substring.Replace("alpha", string.Empty).Replace("beta", string.Empty));

And finally i appended the output result as json String.

string json = string.Join("\",\"", listSubString);
string otp = "{\"Content\" : \"" content "\",\"ListSubString\": [\"" json "\"]}";

Output :

{
  "Content": "hello world have a nice day",
  "ListSubString": [
    " I don't want this part 1  ",
    " i don't want this part 2  "
  ]
}

CodePudding user response:

Regular expressions allow you to accomplish this without indexes and loops.

Once you have identified a pattern that describes the substrings you are looking to extract e.g. "alpha.*?beta", then rebuilding the content without said substrings is just a matter of concatenating the fragments split by a regular expression:

Content = string.Join(string.Empty, new Regex("alpha.*?beta").Split(text);

As per the substrings themselves, you can capture them in the pattern and extract them from the matches returned by the regular expression:

ListSubString = new Regex("alpha(.*?)beta")
    .Matches(text)
    .Select(match => match.Groups[1])
    .SelectMany(group => group.Captures.OfType<Capture>())
    .Select(capture => capture.Value)
    .ToList();

You can have a look at this answer for some clarification on the Match > Group > Capture hierarchy.

CodePudding user response:

Thanks to everyone who responded to me. In the end, I decided to do it in this way:

var splitedContent = content.Split(new string[] { "alpha", "beta" }, StringSplitOptions.None);

Content = string.Join(" ", splitedContent.Where((_, index) => index % 2 == 0));
Css = splitedContent.Where((_, index) => index % 2 != 0).ToList<string>();

Regex probably was the best and most performant solution but I don't get how it works perfectly so at the moment this is my solution.

  • Related