I would like to ask for help in optimizing the code. I have a list of let's say 26 elements:
indata = [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0]
Just for further reading: when I mention "subset" => is the sub-'set' of the data, not the data type. I'm looking for "sub-lists".
I'm preparing a function that will perform further calculations on subsets of that list. The problem is, if subsets are generated over uneven numbers sometimes the same element goes into different subsets twice or more. The subsets I'm looking for are:
- subset 1 => raw data
- subset 2 & 3 => first and second half of data
- subset 4 - 7 => first, second, third and fourth of 1/4 of data
- subsets 8 - 15 => next 1/8 of set.
I came up with a rather sloppy and long solution inside function body, that goes like this:
for i in iterate:
if i == 0:
subset = indata
elif i == 1:
subset = indata[0:int(len(indata)/2)]
elif i == 2:
subset = indata[int(len(indata)/2):]
elif i == 3:
subset = indata[0:int(len(indata)/4)]
elif i == 4:
subset = indata[int(len(indata)/4):int(round((len(indata)/4)*2,0))]
elif i == 5:
subset = indata[int(round((len(indata)/4)*2,0)):int(round((len(indata)/4)*3,0))]
elif i == 6:
subset = indata[int(round((len(indata)/4)*3,0)):]
elif i == 7:
subset = indata[0:int(len(indata)/8)]
elif i == 8:
subset = indata[int(len(indata)/8):int(round((len(indata)/8)*2,0))]
elif i == 9:
subset = indata[int(len(indata)/8)*2:int(round((len(indata)/8)*3,0))]
elif i == 10:
subset = indata[int((len(indata)/8)*3 0.25):int(round((len(indata)/8)*4,0))]
elif i == 11:
subset = indata[int((len(indata)/8)*4 0.25):int(round((len(indata)/8)*5,0))]
elif i == 12:
subset = indata[int((len(indata)/8)*5 0.25):int(round((len(indata)/8)*6,0))]
elif i == 13:
subset = indata[int((len(indata)/8)*6 0.5):int(round((len(indata)/8)*7,0))]
elif i == 14:
subset = indata[int((len(indata)/8)*7 0.5):]
else:
subset = indata[int((len(indata)/8)*7 0.5):]
-here go further instruction on the subset, then loop go back and repeat.
it does what it should (the 0.25 and 0.5 parts added are to avoid including same element goes to two or more subsets, when let's say length of subset is 3.25). However there must be definitely a better way to do this. I don't mind having uneven sets, lets say, when dividing by 4 to have 2 7-element lists and 2 6-element list. As long as element are distinct.
Thank you for help.
CodePudding user response:
You can use a list comprehension to obtain these subsets:
indata = [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85,
44, 89, 26, 0, 67, 67, 23, 0, 0]
subsets = [indata[p*size:(p 1)*size]
for parts in (1,2,4,8)
for size in [len(indata)//parts]
for p in range(parts)]
Output:
for i,subset in enumerate(subsets,1): print(i,subset)
1 [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85, 44,
89, 26, 0, 67, 67, 23, 0, 0]
2 [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42]
3 [30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0]
4 [0, 0, 50, 0, 32, 35]
5 [151, 163, 9, 1, 3, 3]
6 [42, 30, 16, 14, 85, 44]
7 [89, 26, 0, 67, 67, 23]
8 [0, 0, 50]
9 [0, 32, 35]
10 [151, 163, 9]
11 [1, 3, 3]
12 [42, 30, 16]
13 [14, 85, 44]
14 [89, 26, 0]
15 [67, 67, 23]
CodePudding user response:
def divide_data(data, chunks):
idx = 0
sizes = [len(data) // chunks int(x < len(data)%chunks) for x in range(chunks)]
for size in sizes:
yield data[idx:idx size]
idx = size
data = list(range(26)) # or whatever, e.g. [0, 0, 50, ...]
for num_subsets in (1, 2, 4, 8):
print(f'num subsets: {num_subsets}')
for subset in divide_data(data, num_subsets):
print(subset)
num subsets: 1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
num subsets: 2
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
num subsets: 4
[0, 1, 2, 3, 4, 5, 6]
[7, 8, 9, 10, 11, 12, 13]
[14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25]
num subsets: 8
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10]
[11, 12, 13]
[14, 15, 16]
[17, 18, 19]
[20, 21, 22]
[23, 24, 25]
Credit to this answer for inspiration