Home > OS >  How do I split a large list containing strings into multiple text files
How do I split a large list containing strings into multiple text files

Time:11-03

I have a large set of data which I am importing into jupyter from my desktop using glob.glob(some_file_path.dat) into python. This outputs a single list containing each of the data file paths as strings (1851 in total). What I wish to do is split this massive list into 10 text files (so 9 files would contain 185 strings and 1 would contain 186 as there are 1851 in total). I am at a bit of a loss how to go about doing this so they split based 'evenly' by specifying 10 text files (each named differently) as the number to split between.

Any help would be greatly appreciated.

CodePudding user response:

I found this question that has been answered before: How do you split a list into evenly sized chunks?

The difference is that you seem to want to split it so the chunks have a shared minimum size (185 in your example). It is easier to split the lists with a shared maximum size. That would be nine lists of size 186 and one list of 177.

Here is a way to split the list as you describe. You can do it with fewer lines but I wanted to show the process more clearly:

import math
from pathlib import Path

list_with_1851_strings = ['path'] * 1851
steps = 10
step_size = math.floor(len(list_with_1851_strings) / steps)
# or just do integer division: len(list_with_1851_strings) // steps

for n in range(steps):
    start = n * step_size
    if len(list_with_1851_strings[start:]) > (step_size * 2):
        end = start   step_size
    else:
        // If end is None in a slice, the sub list will go to the end
        end = None
    sub_list = list_with_1851_strings[start:end]
    
    # Write list to disk
    sub_list_file = Path(f'sublist_{n}.txt')
    sub_list_file.write_text('\n'.join(sub_list))

  • Related