I have an unordered sequence of timestamps. I need to be able calculate min, max and average difference between every subsequent timestamps. e.g. given:
DateTimeOffset now = new DateTimeOffset(new DateTime(2022, 1, 1, 0, 0, 0, 0));
DateTimeOffset[] timestamps = new[] {
now,
now.AddSeconds(5),
now.AddSeconds(10),
now.AddSeconds(15),
now.AddSeconds(30),
now.AddSeconds(31)
};
IEnumerable<DateTimeOffset> timestampsSorted = timestamps.OrderByDescending(x => x);
Should produce:
2022-01-01 00:00:31->2022-01-01 00:00:30 | 00:00:01
2022-01-01 00:00:30->2022-01-01 00:00:15 | 00:00:15
2022-01-01 00:00:15->2022-01-01 00:00:10 | 00:00:05
2022-01-01 00:00:10->2022-01-01 00:00:05 | 00:00:05
2022-01-01 00:00:05->2022-01-01 00:00:00 | 00:00:05
Min 00:00:01
Max 00:00:15
Avg 00:00:06.2000000
The procedural solution I have come up with is below, it would be great if I can simplify this using LINQ.
TimeSpan min = TimeSpan.MaxValue;
TimeSpan max = TimeSpan.MinValue;
List<TimeSpan> deltas = new();
for (int i = timestampsSorted.Length - 1; i > 0; i--)
{
DateTimeOffset later = timestamps[i];
DateTimeOffset prev = timestamps[i - 1];
TimeSpan delta = later - prev;
if (delta > max) { max = delta; }
if (delta < min) { min = delta; }
deltas.Add(delta);
Console.WriteLine($"{later:yyyy-MM-dd HH:mm:ss}->{prev:yyyy-MM-dd HH:mm:ss} | {delta}");
}
var result = new {
Min = min,
Max = max,
Avg = TimeSpan.FromMilliseconds(deltas.Average(d => d.TotalMilliseconds))
};
CodePudding user response:
Using LINQ
's built-in Min
, Max
and Average
functions.
var timestampsSorted = timestamps.OrderByDescending(o => o).ToArray();
var data = timestampsSorted
.Skip(1)
.Select((o, i) => timestampsSorted[i] - o)
.ToArray();
var min = data.Min();
var max = data.Max();
var avg = TimeSpan.FromSeconds(data.Average(o => o.TotalSeconds));
Note that the separate calls to these Min
, Max
and Average
functions result in 3 iterations over the items in the data
array.
CodePudding user response:
You don't need to store all of the delta
values in a List<TimeSpan>
on which to call Average()
; it's more efficient to just keep a running sum and then divide it by the number of pairs compared (timestamps.Length - 1
). So this...
// ...
List<TimeSpan> deltas = new();
for (int i = timestamps.Length - 1; i > 0; i--)
{
// ...
deltas.Add(delta);
// ...
}
var result = new {
// ...
Avg = TimeSpan.FromMilliseconds(deltas.Average(d => d.TotalMilliseconds))
};
...would be changed to...
// ...
TimeSpan sum = TimeSpan.Zero;
for (int i = timestamps.Length - 1; i > 0; i--)
{
// ...
sum = delta;
// ...
}
var result = new {
// ...
//TODO: Avoid division for sequences with less than 2 elements, if expected
Avg = TimeSpan.FromMilliseconds(sum.TotalMilliseconds / (timestamps.Length - 1))
};
Aggregate()
is what you'd use to accumulate one or more values over the course of a sequence. Here's a method that uses Aggregate()
to calculate the same values as your for
loop...
static (TimeSpan? Minimum, TimeSpan? Maximum, TimeSpan? Average, int Count) GetDeltaStatistics(IEnumerable<DateTimeOffset> timestamps)
{
var seed = (
Previous: (DateTimeOffset?) null,
Minimum: (TimeSpan?) null,
Maximum: (TimeSpan?) null,
Sum: TimeSpan.Zero,
Count: 0
);
return timestamps.Aggregate(
seed,
(accumulator, current) => {
if (accumulator.Previous != null)
{
TimeSpan delta = current - accumulator.Previous.Value;
if ( accumulator.Count > 1)
{
// This is not the first comparison; Minimum and Maximum are non-null
accumulator.Minimum = delta < accumulator.Minimum!.Value ? delta : accumulator.Minimum.Value;
accumulator.Maximum = delta > accumulator.Maximum!.Value ? delta : accumulator.Maximum.Value;
}
else
{
// No prior comparisons have been performed
// Minimum and Maximum must be null so unconditionally overwrite them
accumulator.Minimum = accumulator.Maximum = delta;
}
accumulator.Sum = delta;
Console.WriteLine($"{current:yyyy-MM-dd HH:mm:ss}->{accumulator.Previous:yyyy-MM-dd HH:mm:ss} | {delta}");
}
accumulator.Previous = current;
return accumulator;
},
accumulator => (
accumulator.Minimum,
accumulator.Maximum,
Average: accumulator.Count > 0
? new TimeSpan(accumulator.Sum.Ticks / accumulator.Count)
: (TimeSpan?) null,
accumulator.Count
)
);
}
The second parameter of this overload of Aggregate()
is a Func<>
that is passed the current element in the sequence (current
) and the state that was returned from the previous invocation of the Func<>
(accumulator
). The first parameter provides the initial value of accumulator
. The third parameter is a Func<>
that transforms the final value of this state to the return value of Aggregate()
. The state and return value are all value tuples.
Note that this method only needs an IEnumerable<DateTimeOffset>
and not a IList<DateTimeOffset>
or DateTimeOffset[]
; since there is no random access to adjacent elements, though, the value of current
is carried forward to the next invocation via accumulator.Previous
. I also made it the caller's responsibility to provide sorted input, but you could just as easily perform that inside the method.
Calling GetDeltaStatistics()
with...
static void Main()
{
DateTimeOffset now = new DateTimeOffset(new DateTime(2022, 1, 1, 0, 0, 0, 0));
DateTimeOffset[] timestamps = new[] {
now,
now.AddSeconds(5),
now.AddSeconds(10),
now.AddSeconds(15),
now.AddSeconds(30),
now.AddSeconds(31)
};
IEnumerable<IEnumerable<DateTimeOffset>> timestampSequences = new IEnumerable<DateTimeOffset>[] {
timestamps,
timestamps.Take(2),
timestamps.Take(1),
timestamps.Take(0)
};
foreach (IEnumerable<DateTimeOffset> sequence in timestampSequences)
{
var (minimum, maximum, average, count) = GetDeltaStatistics(sequence.OrderBy(offset => offset));
Console.WriteLine($"Minimum: {(minimum == null ? "(null)" : minimum)}");
Console.WriteLine($"Maximum: {(maximum == null ? "(null)" : maximum)}");
Console.WriteLine($"Average: {(average == null ? "(null)" : average)}");
Console.WriteLine($" Count: {count}");
Console.WriteLine();
}
}
...produces this output...
2022-01-01 00:00:05->2022-01-01 00:00:00 | 00:00:05 2022-01-01 00:00:10->2022-01-01 00:00:05 | 00:00:05 2022-01-01 00:00:15->2022-01-01 00:00:10 | 00:00:05 2022-01-01 00:00:30->2022-01-01 00:00:15 | 00:00:15 2022-01-01 00:00:31->2022-01-01 00:00:30 | 00:00:01 Minimum: 00:00:01 Maximum: 00:00:15 Average: 00:00:06.2000000 Count: 5 2022-01-01 00:00:05->2022-01-01 00:00:00 | 00:00:05 Minimum: 00:00:05 Maximum: 00:00:05 Average: 00:00:05 Count: 1 Minimum: (null) Maximum: (null) Average: (null) Count: 0 Minimum: (null) Maximum: (null) Average: (null) Count: 0
Whereas the original code would cause an exception to be thrown, for sequences with less than two elements the result has a Count
of 0
and the other fields are null
.