Creating a pandas Dataframe from loops in file-CodePudding

I have a file containing a yearly dataset from 1987 to 2008 each in its ".csv" format. I would love to have a loop that reads each file to the pandas Dataframe and name it according to filename excluding the CSV extension.

I have tried this:

flight_data = []

df_lists = []

for flights_file in glob.glob("../datasets/*.csv"):

flight_data.append(flights_file)

df_lists.append('df_' flights_file.split("\\")[-1][:-4:])

but I am stuck in trying to read it to a dataframe and calling it subsequently.

I am trying to use a loop to avoid loading each file individually like:

df_1988 = pd.read_csv("../datasets/1988.csv")

df_1989 = pd.read_csv("../datasets/1989.csv")

df_1990 = pd.read_csv("../datasets/1990.csv")

df_1991 = pd.read_csv("../datasets/1991.csv")

df_1992 = pd.read_csv("../datasets/1992.csv")

Thank you

CodePudding user response：

You could try something like this:

for file in glob.glob("../datasets/*.csv"):

df = pd.read_csv(file)

df.columns = ['year', 'month', 'day', 'dep_time', 'sched_dep_time', 'dep_delay',
       'arr_time', 'sched_arr_time', 'arr_delay', 'carrier', 'flight',
       'tailnum', 'origin', 'dest', 'air_time', 'distance', 'hour', 'minute',
       'time_hour']

df.to_csv(file, index=False)

CodePudding user response：

We cannot dynamically declare variables with names like df_1990, df_1991... We can however, use a dictionary to store your DataFrames as follows:

all_df = {}
for file in glob.glob("../datasets/*.csv"):
    
    df = pd.read_csv(file)
    
    df.columns = ['year', 'month', 'day', 'dep_time', 'sched_dep_time', 'dep_delay',
           'arr_time', 'sched_arr_time', 'arr_delay', 'carrier', 'flight',
           'tailnum', 'origin', 'dest', 'air_time', 'distance', 'hour', 'minute',
           'time_hour']
    file_name = file[file.rfind("/")   1: file.find(".")] #Extract name of file
    all_df[file_name ] = df

NOTE: Assuming that the filenames are unique.