Home > Net >  Convert a list into a list of sublists given lengths of the sublists
Convert a list into a list of sublists given lengths of the sublists

Time:10-18

I am given a long list of integers or whatever, and want to split them into a list of sublists given lengths of those sublists.

For example,

given_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
lengths = [4, 3, 2, 2]
desired_list = [[1, 2, 3, 4], [5, 6, 7], [8, 9], [10, 11]]

What I tried is

result_list1 = list()
for i in lengths:
    result_list1.append(given_list[:i])
    given_list = given_list[i:]

This seems to work, but I don't think it is the best way because it is just a small example and I really have to deal with much bigger data.

Is there a better way?

CodePudding user response:

This approach should be fine

There could be a tiny improvement by just saving the last stopping point rather than reindexing the list

In [8]: %%timeit
   ...: given_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
   ...: lengths = [4, 3, 2, 2]
   ...: result_list1 = list()
   ...: for i in lengths:
   ...:     result_list1.append(given_list[:i])
   ...:     given_list = given_list[i:]
   ...:
596 ns ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [9]: %%timeit
   ...: given_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
   ...: lengths = [4, 3, 2, 2]
   ...: result_list1 = list()
   ...: start = 0
   ...: for i in lengths:
   ...:     result_list1.append(given_list[start:start i])
   ...:     start  = i
   ...:
540 ns ± 2.48 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

When the list gets long enough, using a list comprehension and the accumulate function from itertools starts to perform better

In [15]: given_list *= 10000

In [16]: lengths *= 10000

In [17]: %%timeit
    ...: result_list1 = list()
    ...: start = 0
    ...: for i in lengths:
    ...:     result_list1.append(given_list[start:start i])
    ...:     start  = i
    ...:
5.32 ms ± 16 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [18]: from itertools import accumulate

In [19]: %%timeit
    ...: result_list1 = [given_list[a:a i] for i,a in zip(lengths, accumulate(lengths, initial = 0))]
    ...:
    ...:
4.61 ms ± 31.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The actual mechanics of it are essentially the same, but this could also allow you to use it as a generator rather than duplicating the entire list in memory.
The only modification to get a generator instead of a list is to replace the outer square brackets with parentheses, i.e.
(given_list[a:a i] for i,a in zip(lengths, accumulate(lengths, initial=0)))

CodePudding user response:

In your code, doing given_list = given_list[i:] is not great for performance since you make a copy of the list each time you do that only to make another copy with the other slice.

If you are using a versions of python with the walrus operator (3.8 ), you can do this in a list comprehension with an initialized variable by updating the beginning of the slice as you go:

given_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
lengths = [4, 3, 2, 2]

i = 0
[given_list[i:(i:=s i)] for s in lengths]
# [[1, 2, 3, 4], [5, 6, 7], [8, 9], [10, 11]]
  • Related