While converting the list in batches, yield return skips object that has been checked in if condition but not added to bucket because of size constraints.
Total message count: 4
1st bucket count: 2
2nd bucket count: 1
3rd message from the message list is getting skipped.
Here I am creating buckets of size 250kb. Is there any other way to save the correct state or do I need to use for loop?
public static IEnumerable<IEnumerable<Message>> GetBatchSize(IList<Message> source)
{
List<Message> bucket = null;
long size = 0;
foreach (var item in source)
{
if (bucket == null)
{
bucket = new List<Message>();
}
size = size item.Size;
if (250 - (size / 1024) >= item.Size / 1024)
{
bucket.Add(item);
continue;
}
yield return bucket;
bucket = new List<Message>();
size = 0;
}
if (bucket?.Count > 0 && size<250)
{
yield return bucket;
}
}
CodePudding user response:
You can add the item to the list that you create newly after the yield return:
foreach (var item in source)
{
// omitted for brevity
yield return bucket;
bucket = new List<Message> { item };
size = item.Size;
}
CodePudding user response:
(Assuming that no single message's size is greater than the bucket size)
First, when the current bucket can't contain the current message, you should put the message into the next bucket, instead of throwing it away. So after yield returning in the foreach
loop, you should do:
bucket = new() { item };
^^^^^^^^
Also your condition for adding an item to the bucket is incorrect. It "double counts" the current item's size. Note that size
at this point already has item.Size
added to it! Also, using division here will discard the remainder, which leads to producing incorrect results for cases such as:
GetBatchSize(new List<Message> {
new Message { Size = 250 * 1024 - 1 },
new Message { Size = 1 },
new Message { Size = 1 },
});
Instead of dividing, multiply instead:
if (size <= 250 * 1024)
{
bucket.Add(item);
continue;
}
Also, I'm not sure what the purpose of the last size < 250
check is. You probably meant size < 250 * 1024
, but because of the invariant of the loop, that should always be true, so you can just delete it:
if (bucket?.Count > 0)
{
yield return bucket;
}