To frame the question. I am searching a directory for all csv files. I am saving the path of each csv file along with the delineation into a DataFrame. I know want to iterate over the DataFrame, and read in the specific csv file into a dataframe with a name generated from the original filename. I cannot figure out how to dynamically generate these dataframes. I started coding a few days ago so apologies if the syntax is poor.
# Looks in a given directory and all subsequent subdirectories for the extension ".csv"
# Reads path to all csv files and creates a list
PATH = "Z:\Adam"
EXT = "*.csv"
all_csv_files = [file
for path, subdir, files in os.walk(PATH)
for file in glob(os.path.join(path, EXT))]
# The list of csv file directories is read into a DataFrame
# Dataframe is then split into columns based on the \\ found in the path
df_csv_path = pd.DataFrame(all_csv_files, columns =['Path'])
df_split_path = df_csv_path['Path'].str.split('\\', n = -1, expand = True)
df_split_path = df_split_path.rename(columns = {0:'Drive',1:'Main',2:'Project',3:'Imaging Folder', 4:'Experimental Group',5:'Experimental Rep',6:'File Name'})
df_csv_info = df_split_path.join(df_csv_path['Path'])
# Generates a Dataframe for each of the csv files found in directory
# Dataframe has a name based on the csv filename
for index in df_csv_info.index:
filepath = ""
filename = df_csv_info['File Name'].values[index]
filepath = str(df_csv_info['Path'].values[index])
filename = pd.read_csv(filepath)
CodePudding user response:
The best way is to create a dictionary whose keys are the filenames and the values are the corresponding DataFrame. Instead of using os.path
and glob
, the modern approach is to use pathlib
from the standard library.
Assuming that you don't actually need the DataFrame containing the filenames and just want the DataFrames for each csv file, you can simply do
from pathlib import Path
PATH = Path("Z:\Adam")
EXT = "*.csv"
# dictionary holding all the files DataFrames with the format {"filename": file_DataFrame}
files_dfs = {}
# recursive search for csv files in PATH folder and subfolders
for csv_file in PATH.rglob(EXT):
filename = csv_file.name # get the filename
df = pd.read_csv(csv_file) # read the csv file as a DataFrame
files_dfs[filename] = df # add the DataFrame to the dictionary
Then, to access the DataFrame of a specific file you can do
filename_df = files_dfs["<filename>"]