Home > Software engineering >  How do I find average difference between a sequence of timestamps in C# using LINQ?
How do I find average difference between a sequence of timestamps in C# using LINQ?

Time:10-23

I have an unordered sequence of timestamps. I need to be able calculate min, max and average difference between every subsequent timestamps. e.g. given:

DateTimeOffset now = new DateTimeOffset(new DateTime(2022, 1, 1, 0, 0, 0, 0));
DateTimeOffset[] timestamps = new[] {
    now,
    now.AddSeconds(5),
    now.AddSeconds(10),
    now.AddSeconds(15),
    now.AddSeconds(30),
    now.AddSeconds(31)
};
    
IEnumerable<DateTimeOffset> timestampsSorted = timestamps.OrderByDescending(x => x);

Should produce:

2022-01-01 00:00:31->2022-01-01 00:00:30 | 00:00:01
2022-01-01 00:00:30->2022-01-01 00:00:15 | 00:00:15
2022-01-01 00:00:15->2022-01-01 00:00:10 | 00:00:05
2022-01-01 00:00:10->2022-01-01 00:00:05 | 00:00:05
2022-01-01 00:00:05->2022-01-01 00:00:00 | 00:00:05

Min 00:00:01
Max 00:00:15
Avg 00:00:06.2000000

The procedural solution I have come up with is below, it would be great if I can simplify this using LINQ.

TimeSpan min = TimeSpan.MaxValue;
TimeSpan max = TimeSpan.MinValue;
List<TimeSpan> deltas = new();

for (int i = timestampsSorted.Length - 1; i > 0; i--)
{
    DateTimeOffset later = timestamps[i];
    DateTimeOffset prev = timestamps[i - 1];

    TimeSpan delta = later - prev;
    
    if (delta > max) { max = delta; }
    if (delta < min) { min = delta; }

    deltas.Add(delta);
    Console.WriteLine($"{later:yyyy-MM-dd HH:mm:ss}->{prev:yyyy-MM-dd HH:mm:ss} | {delta}");
}

var result = new { 
    Min = min,
    Max = max,
    Avg = TimeSpan.FromMilliseconds(deltas.Average(d => d.TotalMilliseconds))
};

CodePudding user response:

Using LINQ's built-in Min, Max and Average functions.

var timestampsSorted = timestamps.OrderByDescending(o => o).ToArray();
var data = timestampsSorted
    .Skip(1)
    .Select((o, i) => timestampsSorted[i] - o)
    .ToArray();
var min = data.Min();
var max = data.Max();
var avg = TimeSpan.FromSeconds(data.Average(o => o.TotalSeconds));

Note that the separate calls to these Min, Max and Average functions result in 3 iterations over the items in the data array.

CodePudding user response:

You don't need to store all of the delta values in a List<TimeSpan> on which to call Average(); it's more efficient to just keep a running sum and then divide it by the number of pairs compared (timestamps.Length - 1). So this...

// ...
List<TimeSpan> deltas = new();

for (int i = timestamps.Length - 1; i > 0; i--)
{
    // ...
    deltas.Add(delta);
    // ...
}

var result = new {
    // ...
    Avg = TimeSpan.FromMilliseconds(deltas.Average(d => d.TotalMilliseconds))
};

...would be changed to...

// ...
TimeSpan sum = TimeSpan.Zero;

for (int i = timestamps.Length - 1; i > 0; i--)
{
    // ...
    sum  = delta;
    // ...
}

var result = new { 
    // ...
    //TODO: Avoid division for sequences with less than 2 elements, if expected
    Avg = TimeSpan.FromMilliseconds(sum.TotalMilliseconds / (timestamps.Length - 1))
};

Aggregate() is what you'd use to accumulate one or more values over the course of a sequence. Here's a method that uses Aggregate() to calculate the same values as your for loop...

static (TimeSpan? Minimum, TimeSpan? Maximum, TimeSpan? Average, int Count) GetDeltaStatistics(IEnumerable<DateTimeOffset> timestamps)
{
    var seed = (
        Previous: (DateTimeOffset?) null,
        Minimum: (TimeSpan?) null,
        Maximum: (TimeSpan?) null,
        Sum: TimeSpan.Zero,
        Count: 0
    );

    return timestamps.Aggregate(
        seed,
        (accumulator, current) => {
            if (accumulator.Previous != null)
            {
                TimeSpan delta = current - accumulator.Previous.Value;

                if (  accumulator.Count > 1)
                {
                    // This is not the first comparison; Minimum and Maximum are non-null
                    accumulator.Minimum = delta < accumulator.Minimum!.Value ? delta : accumulator.Minimum.Value;
                    accumulator.Maximum = delta > accumulator.Maximum!.Value ? delta : accumulator.Maximum.Value;
                }
                else
                {
                    // No prior comparisons have been performed
                    // Minimum and Maximum must be null so unconditionally overwrite them
                    accumulator.Minimum = accumulator.Maximum = delta;
                }
                accumulator.Sum  = delta;

                Console.WriteLine($"{current:yyyy-MM-dd HH:mm:ss}->{accumulator.Previous:yyyy-MM-dd HH:mm:ss} | {delta}");
            }
            accumulator.Previous = current;

            return accumulator;
        },
        accumulator => (
            accumulator.Minimum,
            accumulator.Maximum,
            Average: accumulator.Count > 0
                ? new TimeSpan(accumulator.Sum.Ticks / accumulator.Count)
                : (TimeSpan?) null,
            accumulator.Count
        )
    );
}

The second parameter of this overload of Aggregate() is a Func<> that is passed the current element in the sequence (current) and the state that was returned from the previous invocation of the Func<> (accumulator). The first parameter provides the initial value of accumulator. The third parameter is a Func<> that transforms the final value of this state to the return value of Aggregate(). The state and return value are all value tuples.

Note that this method only needs an IEnumerable<DateTimeOffset> and not a IList<DateTimeOffset> or DateTimeOffset[]; since there is no random access to adjacent elements, though, the value of current is carried forward to the next invocation via accumulator.Previous. I also made it the caller's responsibility to provide sorted input, but you could just as easily perform that inside the method.

Calling GetDeltaStatistics() with...

static void Main()
{
    DateTimeOffset now = new DateTimeOffset(new DateTime(2022, 1, 1, 0, 0, 0, 0));
    DateTimeOffset[] timestamps = new[] {
        now,
        now.AddSeconds(5),
        now.AddSeconds(10),
        now.AddSeconds(15),
        now.AddSeconds(30),
        now.AddSeconds(31)
    };

    IEnumerable<IEnumerable<DateTimeOffset>> timestampSequences = new IEnumerable<DateTimeOffset>[] {
        timestamps,
        timestamps.Take(2),
        timestamps.Take(1),
        timestamps.Take(0)
    };
    foreach (IEnumerable<DateTimeOffset> sequence in timestampSequences)
    {
        var (minimum, maximum, average, count) = GetDeltaStatistics(sequence.OrderBy(offset => offset));

        Console.WriteLine($"Minimum: {(minimum == null ? "(null)" : minimum)}");
        Console.WriteLine($"Maximum: {(maximum == null ? "(null)" : maximum)}");
        Console.WriteLine($"Average: {(average == null ? "(null)" : average)}");
        Console.WriteLine($"  Count: {count}");
        Console.WriteLine();
    }
}

...produces this output...

2022-01-01 00:00:05->2022-01-01 00:00:00 | 00:00:05
2022-01-01 00:00:10->2022-01-01 00:00:05 | 00:00:05
2022-01-01 00:00:15->2022-01-01 00:00:10 | 00:00:05
2022-01-01 00:00:30->2022-01-01 00:00:15 | 00:00:15
2022-01-01 00:00:31->2022-01-01 00:00:30 | 00:00:01
Minimum: 00:00:01
Maximum: 00:00:15
Average: 00:00:06.2000000
  Count: 5

2022-01-01 00:00:05->2022-01-01 00:00:00 | 00:00:05
Minimum: 00:00:05
Maximum: 00:00:05
Average: 00:00:05
  Count: 1

Minimum: (null)
Maximum: (null)
Average: (null)
  Count: 0

Minimum: (null)
Maximum: (null)
Average: (null)
  Count: 0

Whereas the original code would cause an exception to be thrown, for sequences with less than two elements the result has a Count of 0 and the other fields are null.

  • Related