Question has been edited. See comments. Changes are in italic.
I have the following list of ordered (start/stop) events that are generated by multiple physical device identified by an ID. For simplicity, only data for one device is shown.
Date Hour ID Event IsStart
18/10/2021 10:35:22 1 DeviceConnected True
18/10/2021 10:20:10 1 DeviceConnected True
18/10/2021 10:12:20 1 DeviceConnected False
18/10/2021 10:12:19 1 DeviceConnected False
18/10/2021 08:24:14 1 DeviceConnected True
Over a given period of time, which is usually 24h, I can't have two or more consecutive starts or stops. I need to remove "duplicates".
In the example above, that would mean, after applying a filter using linq:
Date Hour ID Event IsStart
18/10/2021 10:20:10 1 DeviceConnected True
18/10/2021 10:12:19 1 DeviceConnected False
18/10/2021 08:24:14 1 DeviceConnected True
A start should be followed by a stop or nothing, and the reverse.
This is a typical Gaps-and-Islands problem, right? Let's add the Island number to the initial data set. Unless I'm wrong we should end with something like:
Date Hour ID Event IsStart Island
18/10/2021 10:35:22 1 DeviceConnected True 3
18/10/2021 10:20:10 1 DeviceConnected True 3
18/10/2021 10:12:20 1 DeviceConnected False 2
18/10/2021 10:12:19 1 DeviceConnected False 2
18/10/2021 08:24:14 1 DeviceConnected True 1
Can I do that with Linq? If so, I should be able to retain only the first record on the island.
CodePudding user response:
I don't think you can accomplish this with out-of-the-box LINQ functions, but by writing a quick "lag" extension method you can then accomplish it with a single LINQ statement:
public static class EnumerableExtensions
{
public static IEnumerable<TResult> Lag<TSource, TResult>(
this IEnumerable<TSource> source,
TSource defaultLagValue,
Func<TSource, TSource, TResult> resultSelector
)
{
TSource lagValue = defaultLagValue;
foreach (var item in source)
{
yield return resultSelector(item, lagValue);
lagValue = item;
}
}
}
Then for the purpose of this answer, I'm creating a class to hold the data (which you should already have):
public class Event
{
public DateTime Date { get; set; }
public string Name { get; set; }
public bool IsStart { get; set; }
}
Then, here's how you would run it:
var data = new List<Event>();
data.Add(new Event { Date = new DateTime(2021, 10, 18, 10, 35, 22), Name = "DeviceConnected", IsStart = true });
data.Add(new Event { Date = new DateTime(2021, 10, 18, 10, 20, 10), Name = "DeviceConnected", IsStart = true });
data.Add(new Event { Date = new DateTime(2021, 10, 18, 10, 12, 20), Name = "DeviceConnected", IsStart = false });
data.Add(new Event { Date = new DateTime(2021, 10, 18, 10, 12, 19), Name = "DeviceConnected", IsStart = false });
data.Add(new Event { Date = new DateTime(2021, 10, 18, 8, 24, 14), Name = "DeviceConnected", IsStart = true });
List<Event> filteredData = data
.OrderBy(e => e.Date)
.Lag(null, (e, lag) => new {
Event = e,
PreviousItem = lag,
})
.Where(x => x.PreviousItem == null || x.Event.IsStart != x.PreviousItem.IsStart)
.Select(x => x.Event)
.OrderByDescending(e => e.Date)
.ToList();
After that, filteredData
should contain your expected output similar to this:
Date Hour Event IsStart
18/10/2021 10:20:10 DeviceConnected True
18/10/2021 10:12:19 DeviceConnected False
18/10/2021 08:24:14 DeviceConnected True