I'm trying to get a sample out of a dataset with random size, what I'm trying to do is:
# first I'm defining the frequency for each sample size.
population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
weights = [0.025, 0.025, 0.05, 0.1, 0.2, 0.3, 0.1, 0.1, 0.05, 0.05]
# with that done, I want to select 10 chunks of data. So I do:
for _ in range(5):
n = int(choices(population, weights)[0])
batch_sizes.append(n)
# where the output looks like:
batch_sizes = [3, 4, 4, 5, 2]
# data is something like this:
data = [0, 1, 2, 3, 4, 5, 6, 7, ... , 1000001, 1000002, 1000003]
# What I want is, using the batch_sizes presented above:
[0, 1, 2], [3, 4, 5, 6], [7, 8, 9, 10]
# and so on.
My problem is, how do iterate over data getting a different sample size?
I already tried:
for i in range(0, len(data)-batch_size 1, batch_size):
batch = data[i:i batch_size]
print('Batch: ', batch)
but that was not successful since I'm not being able to iterate over batch_size as well.
CodePudding user response:
Use a loop:
start = 0
for n in batch_sizes:
# print for the demo, you can also add to a list
print(data[start:start stop])
start = stop
Output:
[0, 1, 2, 3, 4]
[5, 6, 7, 8, 9]
[10, 11, 12, 13, 14]
[15, 16, 17, 18, 19]
[20, 21, 22, 23, 24]
CodePudding user response:
Maybe you could try something like this:
i = 0
for batch_size in batch_sizes:
batch = data[i:i batch_size]
i = batch_size
print('Batch: ', batch)
CodePudding user response:
You can use this "one-line" assignment:
data =[v for v in range(20)]
batch_sizes = [3, 4, 4, 5, 2]
result=[data[sum(batch_sizes[:i]):sum(batch_sizes[:i]) batch_sizes[i]] for i in range(len(batch_sizes))]
print(result)
result:
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17]]