I am given a long list of integers or whatever, and want to split them into a list of sublists given lengths of those sublists.
For example,
given_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
lengths = [4, 3, 2, 2]
desired_list = [[1, 2, 3, 4], [5, 6, 7], [8, 9], [10, 11]]
What I tried is
result_list1 = list()
for i in lengths:
result_list1.append(given_list[:i])
given_list = given_list[i:]
This seems to work, but I don't think it is the best way because it is just a small example and I really have to deal with much bigger data.
Is there a better way?
CodePudding user response:
This approach should be fine
There could be a tiny improvement by just saving the last stopping point rather than reindexing the list
In [8]: %%timeit
...: given_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
...: lengths = [4, 3, 2, 2]
...: result_list1 = list()
...: for i in lengths:
...: result_list1.append(given_list[:i])
...: given_list = given_list[i:]
...:
596 ns ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [9]: %%timeit
...: given_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
...: lengths = [4, 3, 2, 2]
...: result_list1 = list()
...: start = 0
...: for i in lengths:
...: result_list1.append(given_list[start:start i])
...: start = i
...:
540 ns ± 2.48 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
When the list gets long enough, using a list comprehension and the accumulate
function from itertools starts to perform better
In [15]: given_list *= 10000
In [16]: lengths *= 10000
In [17]: %%timeit
...: result_list1 = list()
...: start = 0
...: for i in lengths:
...: result_list1.append(given_list[start:start i])
...: start = i
...:
5.32 ms ± 16 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [18]: from itertools import accumulate
In [19]: %%timeit
...: result_list1 = [given_list[a:a i] for i,a in zip(lengths, accumulate(lengths, initial = 0))]
...:
...:
4.61 ms ± 31.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The actual mechanics of it are essentially the same, but this could also allow you to use it as a generator rather than duplicating the entire list in memory.
The only modification to get a generator instead of a list is to replace the outer square brackets with parentheses, i.e.
(given_list[a:a i] for i,a in zip(lengths, accumulate(lengths, initial=0)))
CodePudding user response:
In your code, doing given_list = given_list[i:]
is not great for performance since you make a copy of the list each time you do that only to make another copy with the other slice.
If you are using a versions of python with the walrus operator (3.8 ), you can do this in a list comprehension with an initialized variable by updating the beginning of the slice as you go:
given_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
lengths = [4, 3, 2, 2]
i = 0
[given_list[i:(i:=s i)] for s in lengths]
# [[1, 2, 3, 4], [5, 6, 7], [8, 9], [10, 11]]