I have a large set of data which I am importing into jupyter from my desktop using glob.glob(some_file_path.dat)
into python. This outputs a single list containing each of the data file paths as strings (1851 in total). What I wish to do is split this massive list into 10 text files (so 9 files would contain 185 strings and 1 would contain 186 as there are 1851 in total). I am at a bit of a loss how to go about doing this so they split based 'evenly' by specifying 10 text files (each named differently) as the number to split between.
Any help would be greatly appreciated.
CodePudding user response:
I found this question that has been answered before: How do you split a list into evenly sized chunks?
The difference is that you seem to want to split it so the chunks have a shared minimum size (185 in your example). It is easier to split the lists with a shared maximum size. That would be nine lists of size 186 and one list of 177.
Here is a way to split the list as you describe. You can do it with fewer lines but I wanted to show the process more clearly:
import math
from pathlib import Path
list_with_1851_strings = ['path'] * 1851
steps = 10
step_size = math.floor(len(list_with_1851_strings) / steps)
# or just do integer division: len(list_with_1851_strings) // steps
for n in range(steps):
start = n * step_size
if len(list_with_1851_strings[start:]) > (step_size * 2):
end = start step_size
else:
// If end is None in a slice, the sub list will go to the end
end = None
sub_list = list_with_1851_strings[start:end]
# Write list to disk
sub_list_file = Path(f'sublist_{n}.txt')
sub_list_file.write_text('\n'.join(sub_list))