How to batch process text files in path and create variables according to their file name-CodePudding

Lets say I have multiple text files in my path C:/Users/text_file/

I want to process them and set variables in loop for each processed text files in variables named after the filename.

To give an idea , if I have in text_file folder:

readfile_1.txt ,readfile_2.txt, readfile_3.txt, .....,....,.... ,readfile_n.txt

and i want to preprocess them with

with open(file_path, 'r', encoding='utf8') as f:
        processed = [x.strip() for x in f]

I did

import glob, os
path = 'C:/Users/text_file/'
files = os.listdir(path)
print(len(files))

txtfiles={}
for file in files:
    file_path = path file
    print('Processing...' file_path)
    with open(file_path, 'r', encoding='utf8') as f:
        processed = [x.strip() for x in f]
    txtfiles[file_path] = processed

for filename, contents in txtfiles.items():
     print (filename, (contents))

But what I want with the loop is Variables with prefix cc i.e cc_readfile_1 , cc_readfile_2 and cc_readfile_3

so that whenever i call cc_readfile_1 or cc_readfile_2, the output is as it would be if done one by one i.e

with open(r'C:\Users\text_file\readfile_1.txt', 'r', encoding='utf8') as f:
        cc_readfile_1 = [x.strip() for x in f]
print(readfile_1)

If you want to know why I need this , I have over 100 text files which I need to process and keep in variables in python notebook for further analysis. I do not want to execute the code 100 times renaming with different file names and variables each time.

CodePudding user response：

you can use fstrings to generate the correct Key :

You will be able to access them in the dictionary

import glob, os
path = 'C:/Users/text_file/'
files = os.listdir(path)
print(len(files))

txtfiles={}
for file in files:
    file_path = path file
    print('Processing...' file_path)
    with open(file_path, 'r', encoding='utf8') as f:
        processed = [x.strip() for x in f]
    txtfiles[f"cc_{file_path}"] = processed

for filename, contents in txtfiles.items():
     print (filename, (contents))

CodePudding user response：

Use a dictionary where the keys are the files' basenames without extension. There's no real point in adding a constant prefix (cc_).

So, for example, if the filename is readfile_1.txt then the key would simply be readfile_1

The value associated with each key should be a list of all of the (stripped) lines in the file.

from os.path import join, basename, splitext
from glob import glob

PATH = 'C:/Users/text_file'
EXT = '*.txt'

all_files = dict()

for file in glob(join(PATH, EXT)):
    with open(file) as infile:
        key = splitext(basename(file))[0]
        all_files[key] = list(map(str.strip, infile))

Subsequently, to access the lines from readfile_1.txt it's just:

all_files['readfile_1']