Home > Software engineering >  How to use linq to group a list of strings on only certain strings
How to use linq to group a list of strings on only certain strings

Time:09-04

example list of strings:

var test = new List<string>{
    "hdr1","abc","def","ghi","hdr2","lmn","opq","hdr3","rst","xyz"
};

I want to partition this list by "hdr*" so that each group contains elements...

"hdr1","abc","def","ghi"  
"hdr2","lmn","opq",  
"hdr3","rst","xyz"  

I tried:

var result = test.GroupBy(g => g.StartsWith("hdr"));

but this gives me two groups

"hdr1","hdr2","hdr3"  
"abc","def"..."xyz"  

What is the proper LINQ statement I should use? Let me emphasize that the strings following "hdr*" could be anything. The only thing they have in common is that they follow "hdr*".

CodePudding user response:

You could make a fancy extension method GroupWhen that starts a new group when it finds a matching item. Just like IEnumerable.GroupBy it will return a "list" of groups:

public static IEnumerable<IGrouping<int, T>> GroupWhen<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
    var i = 0;
    // This method "marks" which group each item belongs in
    // by creating a Tuple with the item and group number
    IEnumerable<(T Item, int GroupNum)> Iterate()
    {
        foreach (var item in source)
        {
            if (predicate(item)) i  ; // Start new group
            yield return (item, i);
        }
    }
    // Group items by the "mark" from above and only
    // output the Item from the Tuple, since the
    // GroupNum will be the 'int' key of the group
    return Iterate().GroupBy(tup => tup.GroupNum, tup => tup.Item);
}

// Use like so:
var list = new List<string> {"hdr1","abc","def","ghi","hdr2","lmn","opq","hdr3","rst","xyz"};
var groups = list.GroupWhen(s => s.StartsWith("hdr"))
Console.WriteLine(string.Join(",", groups.First()))
// hdr1,abc,def,ghi

Check out this fiddle for a test run.

CodePudding user response:

Yes, you can do it with LINQ expressions. But I don't think it's is much readable than a foreach loop.

var test = new List<string>{
    "hdr1","abc","def","ghi","hdr2","lmn","opq","hdr3","rst","xyz"
};

int groupid = 0;

var result = test.GroupBy(t =>
{
    if (t.StartsWith("hdr"))   groupid;
    return groupid;
}).ToList();

result.Select(t => string.Join(' ', t)).ToList().ForEach(Console.WriteLine);

/*
 Outputs:
hdr1 abc def ghi
hdr2 lmn opq
hdr3 rst xyz
 */

CodePudding user response:

You get two groups because one group is the group of elements starting with "hdr" and the other group is the group of elements not starting with "hdr". StartsWith returns a bool, so this results in two groups having the Keys false and true.

You can use statement blocks in LINQ. This enables us to do:

string header = null;
var groups = test
    .Select(s => {
        if (s.StartsWith("hdr")) header = s;
        return s;
    })
    .Where(s => header != s)
    .GroupBy(s => header);

We store the last header in header. The where clause eliminates the header itself, since the header is the group key.

The following test...

foreach (var g in groups) {
    Console.WriteLine(g.Key);
    foreach (var item in g) {
        Console.WriteLine("    "   item);
    }
}

... prints this with the given list:

hdr1
    abc
    def
    ghi
hdr2
    lmn
    opq
hdr3
    rst
    xyz

Instead, we can also create lists with the header as first element:

string header = null;
IEnumerable<List<string>> lists = test
    .Select(s => {
        if (s.StartsWith("hdr")) {
            header = s;
        }
        return s;
    })
    .GroupBy(s => header)
    .Select(g => g.ToList());

This test...

foreach (var l in lists) {
    foreach (var item in l) {
        Console.Write(item   " ");
    }
    Console.WriteLine();
}

... prints:

hdr1 abc def ghi
hdr2 lmn opq
hdr3 rst xyz
  • Related