Home > other >  Select items based on two constraints keeping the computational effort low
Select items based on two constraints keeping the computational effort low

Time:10-14

I have a list of items and I want to select all items that are valid for a given date. Each item contains a unique identifier, some info and a start and end date. The data structure of these items need to be created only once, but it contains 100.000 items and there are a lot of requests. I was thinking that the data structure is sorted descending by e.g. "endDate" and then I search (using binary search) for the first "endDate" that is not valid anymore, meaning the endDate is before the given date - for example I have a given date of 29.07.2022, then I would know I could discard:

endDate
-------------
30.07.2022 
28.07.2022 <-
26.07.2022 <-
23.07.2022 <-

That would keep the computational effort low. But then I have to sort the data again for startDate. Is there a better way to do this? What data structure would you recommand here?

var itemList = new List<Item>();

class Item
{
     public string id {get; set;}
     public string startDate { get; set; }
     public string endDate { get; set; }
     public string someInfo { get; set; }  
}

Thank you!

CodePudding user response:

This answer assumes that you have frequent insertions and deletions as well as frequent lookups.

If you want to look things up using two different sorts (for performance reasons), you'll need to maintain two different sorted containers.

For this, you can use SortedSet<T>. This gives you O(Log(N)) lookups, insertions and deletions.

In order to use SortedSet<T> with your Item class you will need to add implementations of Equals() and GetHashCode(), because they are used when adding items to a hashing container such as SortedSet<T>.

You will also need to ensure that all the properties used for identity and sorting are immutable - because otherwise if you change those properties of an item that is currently stored in a hashing or sorted container, it will break.

So your Item class would have to look something like this:

public sealed class Item: IEquatable<Item>
{
    public Item(string id, DateTime startDate, DateTime endDate, string someInfo)
    {
        Id        = id;
        StartDate = startDate;
        EndDate   = endDate;
        SomeInfo  = someInfo;
    }

    public string   Id        { get; } // Used for identity therefore must be immutable.
    public DateTime StartDate { get; } // Used for sorting therefore must be immutable.
    public DateTime EndDate   { get; } // Used for sorting therefore must be immutable.
    public string   SomeInfo  { get; set; } // Not used for sorting or identity, so can be mutable.

    public bool Equals(Item? other)
    {
        if (other is null)
            return false;

        if (ReferenceEquals(this, other))
            return true;

        return Id == other.Id;
    }

    public override bool Equals(object? obj)
    {
        return ReferenceEquals(this, obj) || obj is Item other && Equals(other);
    }

    public override int GetHashCode()
    {
        return Id.GetHashCode();
    }
}

Now you need a way of comparing Item objects by start date and by end date, for use with the two differently-sorted SortedSet<T> collections. We can do this by writing a small class that implements IComparer<Item>:

sealed class ItemComparer : IComparer<Item>
{
    public ItemComparer(bool compareStart)
    {
        _compareStart = compareStart;
    }

    public int Compare(Item? x, Item? y)
    {
        int byDate = _compareStart 
            ? x!.StartDate.CompareTo(y!.StartDate) 
            : x!.EndDate  .CompareTo(y!.EndDate);

        if (byDate != 0)
            return byDate;

        return x.Id.CompareTo(y.Id);
    }

    readonly bool _compareStart;
}

The constructor for this class accepts a bool which defines whether it will sort by StartDate or by EndDate.

Now you can write a class that encapsulates the two differently-sorted SortedSet<T> collections and provides higher-level methods to add, remove and search for items. This class will also contain a nested implementation of the ItemComparer class above, since this is merely an implementation detail that doesn't need to be exposed:

public sealed class ItemStore
{
    public bool Add(Item item)
    {
        _byEnd.Add(item);
        return _byStart.Add(item);
    }

    public bool Remove(Item item)
    {
        _byEnd.Remove(item);
        return _byStart.Remove(item);
    }

    public void Clear()
    {
        _byStart.Clear(); 
        _byEnd  .Clear();
    }

    public IEnumerable<Item> ItemsAfterEndDate(DateTime date)
    {
        date = date.AddDays(1);  // For AFTER the end date.

        var start = new Item("", date, date, "");
        var end   = new Item("", DateTime.MaxValue, DateTime.MaxValue, "");

        return _byEnd.GetViewBetween(start, end);
    }

    public IEnumerable<Item> ItemsBeforeEndDate(DateTime date)
    {
        date = date.AddDays(-1); // For BEFORE the start date.

        var start = new Item("", DateTime.MinValue, DateTime.MinValue, "");
        var end   = new Item("", date,              date,              "");

        return _byEnd.GetViewBetween(start, end);
    }

    public IEnumerable<Item> ItemsAfterStartDate(DateTime date)
    {
        date = date.AddDays(1); // For AFTER the start date.

        var start = new Item("", date,              date,              "");
        var end   = new Item("", DateTime.MaxValue, DateTime.MaxValue, "");

        return _byStart.GetViewBetween(start, end);
    }

    public IEnumerable<Item> ItemsBeforeStartDate(DateTime date)
    {
        date = date.AddDays(-1); // For BEFORE the start date.

        var start = new Item("", DateTime.MinValue, DateTime.MinValue, "");
        var end   = new Item("", date,              date,              "");

        return _byStart.GetViewBetween(start, end);
    }

    sealed class ItemComparer : IComparer<Item>
    {
        public ItemComparer(bool compareStart)
        {
            _compareStart = compareStart;
        }

        public int Compare(Item? x, Item? y)
        {
            int byDate = _compareStart 
                ? x!.StartDate.CompareTo(y!.StartDate) 
                : x!.EndDate  .CompareTo(y!.EndDate);

            if (byDate != 0)
                return byDate;

            return x.Id.CompareTo(y.Id);
        }

        readonly bool _compareStart;
    }

    readonly SortedSet<Item> _byStart = new(_byStartComparer);
    readonly SortedSet<Item> _byEnd   = new(_byEndComparer);

    static readonly IComparer<Item> _byStartComparer = new ItemComparer(compareStart: true);
    static readonly IComparer<Item> _byEndComparer   = new ItemComparer(compareStart: false);
}

You should be able to see how to add further methods to this class, if you need to add more functionality.

You can test this class with code such as this:

static void Main()
{
    var items = new ItemStore();

    items.Add(new Item("1", DateTime.Parse("2022-07-01"), DateTime.Parse("2022-08-01"), "1"));
    items.Add(new Item("2", DateTime.Parse("2022-08-01"), DateTime.Parse("2022-09-01"), "2"));
    items.Add(new Item("3", DateTime.Parse("2022-09-01"), DateTime.Parse("2022-10-01"), "3"));
    items.Add(new Item("4", DateTime.Parse("2022-10-01"), DateTime.Parse("2022-11-01"), "4"));

    items.Add(new Item("1.1", DateTime.Parse("2022-07-01"), DateTime.Parse("2022-08-01"), "1.1"));
    items.Add(new Item("2.1", DateTime.Parse("2022-08-01"), DateTime.Parse("2022-09-01"), "2.1"));
    items.Add(new Item("3.1", DateTime.Parse("2022-09-01"), DateTime.Parse("2022-10-01"), "3.1"));
    items.Add(new Item("4.1", DateTime.Parse("2022-10-01"), DateTime.Parse("2022-11-01"), "4.1"));

    Console.WriteLine("Items with start date before 2022-09-01");

    foreach (var item in items.ItemsBeforeStartDate(DateTime.Parse("2022-09-01")))
    {
        Console.WriteLine(item.Id);
    }

    Console.WriteLine("\nItems with end date after 2022-09-01");

    foreach (var item in items.ItemsAfterEndDate(DateTime.Parse("2022-09-01")))
    {
        Console.WriteLine(item.Id);
    }
}
  • Related