Home > OS >  How to batch process text files in path and create variables according to their file name
How to batch process text files in path and create variables according to their file name

Time:08-15

Lets say I have multiple text files in my path C:/Users/text_file/

I want to process them and set variables in loop for each processed text files in variables named after the filename.

To give an idea , if I have in text_file folder:

readfile_1.txt ,readfile_2.txt, readfile_3.txt, .....,....,.... ,readfile_n.txt

and i want to preprocess them with

with open(file_path, 'r', encoding='utf8') as f:
        processed = [x.strip() for x in f]

I did

import glob, os
path = 'C:/Users/text_file/'
files = os.listdir(path)
print(len(files))

txtfiles={}
for file in files:
    file_path = path file
    print('Processing...' file_path)
    with open(file_path, 'r', encoding='utf8') as f:
        processed = [x.strip() for x in f]
    txtfiles[file_path] = processed

for filename, contents in txtfiles.items():
     print (filename, (contents))

But what I want with the loop is Variables with prefix cc i.e cc_readfile_1 , cc_readfile_2 and cc_readfile_3

so that whenever i call cc_readfile_1 or cc_readfile_2, the output is as it would be if done one by one i.e

with open(r'C:\Users\text_file\readfile_1.txt', 'r', encoding='utf8') as f:
        cc_readfile_1 = [x.strip() for x in f]
print(readfile_1)

If you want to know why I need this , I have over 100 text files which I need to process and keep in variables in python notebook for further analysis. I do not want to execute the code 100 times renaming with different file names and variables each time.

CodePudding user response:

you can use fstrings to generate the correct Key :

You will be able to access them in the dictionary

import glob, os
path = 'C:/Users/text_file/'
files = os.listdir(path)
print(len(files))

txtfiles={}
for file in files:
    file_path = path file
    print('Processing...' file_path)
    with open(file_path, 'r', encoding='utf8') as f:
        processed = [x.strip() for x in f]
    txtfiles[f"cc_{file_path}"] = processed

for filename, contents in txtfiles.items():
     print (filename, (contents))

CodePudding user response:

Use a dictionary where the keys are the files' basenames without extension. There's no real point in adding a constant prefix (cc_).

So, for example, if the filename is readfile_1.txt then the key would simply be readfile_1

The value associated with each key should be a list of all of the (stripped) lines in the file.

from os.path import join, basename, splitext
from glob import glob

PATH = 'C:/Users/text_file'
EXT = '*.txt'

all_files = dict()

for file in glob(join(PATH, EXT)):
    with open(file) as infile:
        key = splitext(basename(file))[0]
        all_files[key] = list(map(str.strip, infile))

Subsequently, to access the lines from readfile_1.txt it's just:

all_files['readfile_1']
  • Related