Home > database >  Filter out events order by time based on their value
Filter out events order by time based on their value

Time:11-03

Question has been edited. See comments. Changes are in italic.

I have the following list of ordered (start/stop) events that are generated by multiple physical device identified by an ID. For simplicity, only data for one device is shown.

Date       Hour     ID Event           IsStart
18/10/2021 10:35:22 1  DeviceConnected  True
18/10/2021 10:20:10 1  DeviceConnected  True
18/10/2021 10:12:20 1  DeviceConnected  False
18/10/2021 10:12:19 1  DeviceConnected  False
18/10/2021 08:24:14 1  DeviceConnected  True

Over a given period of time, which is usually 24h, I can't have two or more consecutive starts or stops. I need to remove "duplicates".

In the example above, that would mean, after applying a filter using linq:

Date       Hour     ID Event           IsStart
18/10/2021 10:20:10 1  DeviceConnected  True
18/10/2021 10:12:19 1  DeviceConnected  False
18/10/2021 08:24:14 1  DeviceConnected  True

A start should be followed by a stop or nothing, and the reverse.

This is a typical Gaps-and-Islands problem, right? Let's add the Island number to the initial data set. Unless I'm wrong we should end with something like:

Date       Hour     ID Event           IsStart Island
18/10/2021 10:35:22 1  DeviceConnected  True    3
18/10/2021 10:20:10 1  DeviceConnected  True    3
18/10/2021 10:12:20 1  DeviceConnected  False   2
18/10/2021 10:12:19 1  DeviceConnected  False   2
18/10/2021 08:24:14 1  DeviceConnected  True    1

Can I do that with Linq? If so, I should be able to retain only the first record on the island.

CodePudding user response:

I don't think you can accomplish this with out-of-the-box LINQ functions, but by writing a quick "lag" extension method you can then accomplish it with a single LINQ statement:

public static class EnumerableExtensions 
{
    public static IEnumerable<TResult> Lag<TSource, TResult>(
        this IEnumerable<TSource> source,
        TSource defaultLagValue,
        Func<TSource, TSource, TResult> resultSelector
    )
    {
        TSource lagValue = defaultLagValue;
        foreach (var item in source)
        {
            yield return resultSelector(item, lagValue);
            lagValue = item;
        }
    }
}

Then for the purpose of this answer, I'm creating a class to hold the data (which you should already have):

public class Event
{
    public DateTime Date { get; set; }
    public string Name { get; set; }
    public bool IsStart { get; set; }
}

Then, here's how you would run it:

var data = new List<Event>();
data.Add(new Event { Date = new DateTime(2021, 10, 18, 10, 35, 22), Name = "DeviceConnected", IsStart = true  });
data.Add(new Event { Date = new DateTime(2021, 10, 18, 10, 20, 10), Name = "DeviceConnected", IsStart = true  });
data.Add(new Event { Date = new DateTime(2021, 10, 18, 10, 12, 20), Name = "DeviceConnected", IsStart = false });
data.Add(new Event { Date = new DateTime(2021, 10, 18, 10, 12, 19), Name = "DeviceConnected", IsStart = false });
data.Add(new Event { Date = new DateTime(2021, 10, 18,  8, 24, 14), Name = "DeviceConnected", IsStart = true  });

List<Event> filteredData = data
    .OrderBy(e => e.Date)
    .Lag(null, (e, lag) => new {
        Event = e,
        PreviousItem = lag,
    })
    .Where(x => x.PreviousItem == null || x.Event.IsStart != x.PreviousItem.IsStart)
    .Select(x => x.Event)
    .OrderByDescending(e => e.Date)
    .ToList();

After that, filteredData should contain your expected output similar to this:

Date       Hour     Event           IsStart
18/10/2021 10:20:10 DeviceConnected True
18/10/2021 10:12:19 DeviceConnected False
18/10/2021 08:24:14 DeviceConnected True
  • Related