Generate Pandas DataFrames from CSV file list-CodePudding

To frame the question. I am searching a directory for all csv files. I am saving the path of each csv file along with the delineation into a DataFrame. I know want to iterate over the DataFrame, and read in the specific csv file into a dataframe with a name generated from the original filename. I cannot figure out how to dynamically generate these dataframes. I started coding a few days ago so apologies if the syntax is poor.

# Looks in a given directory and all subsequent subdirectories for the extension ".csv"
# Reads path to all csv files and creates a list

PATH = "Z:\Adam"
EXT = "*.csv"
all_csv_files = [file
                 for path, subdir, files in os.walk(PATH)
                 for file in glob(os.path.join(path, EXT))]
# The list of csv file directories is read into a DataFrame
# Dataframe is then split into columns based on the \\ found in the path

df_csv_path = pd.DataFrame(all_csv_files, columns =['Path'])
df_split_path = df_csv_path['Path'].str.split('\\', n = -1, expand = True)
df_split_path = df_split_path.rename(columns = {0:'Drive',1:'Main',2:'Project',3:'Imaging Folder', 4:'Experimental Group',5:'Experimental Rep',6:'File Name'})
df_csv_info = df_split_path.join(df_csv_path['Path'])

# Generates a Dataframe for each of the csv files found in directory
# Dataframe has a name based on the csv filename
for index in df_csv_info.index:
    filepath = ""
    filename = df_csv_info['File Name'].values[index]
    filepath = str(df_csv_info['Path'].values[index])
    filename = pd.read_csv(filepath)

CodePudding user response：

The best way is to create a dictionary whose keys are the filenames and the values are the corresponding DataFrame. Instead of using os.path and glob, the modern approach is to use pathlib from the standard library.

Assuming that you don't actually need the DataFrame containing the filenames and just want the DataFrames for each csv file, you can simply do

from pathlib import Path

PATH = Path("Z:\Adam")
EXT = "*.csv"

# dictionary holding all the files DataFrames with the format {"filename": file_DataFrame}
files_dfs = {}

# recursive search for csv files in PATH folder and subfolders 
for csv_file in PATH.rglob(EXT):
    filename = csv_file.name     # get the filename 
    df = pd.read_csv(csv_file)   # read the csv file as a DataFrame
    files_dfs[filename] = df     # add the DataFrame to the dictionary

Then, to access the DataFrame of a specific file you can do

filename_df = files_dfs["<filename>"]