I have a file containing a yearly dataset from 1987 to 2008 each in its ".csv" format. I would love to have a loop that reads each file to the pandas Dataframe and name it according to filename excluding the CSV extension.
I have tried this:
flight_data = []
df_lists = []
for flights_file in glob.glob("../datasets/*.csv"):
flight_data.append(flights_file)
df_lists.append('df_' flights_file.split("\\")[-1][:-4:])
but I am stuck in trying to read it to a dataframe and calling it subsequently.
I am trying to use a loop to avoid loading each file individually like:
df_1988 = pd.read_csv("../datasets/1988.csv")
df_1989 = pd.read_csv("../datasets/1989.csv")
df_1990 = pd.read_csv("../datasets/1990.csv")
df_1991 = pd.read_csv("../datasets/1991.csv")
df_1992 = pd.read_csv("../datasets/1992.csv")
Thank you
CodePudding user response:
You could try something like this:
for file in glob.glob("../datasets/*.csv"):
df = pd.read_csv(file)
df.columns = ['year', 'month', 'day', 'dep_time', 'sched_dep_time', 'dep_delay',
'arr_time', 'sched_arr_time', 'arr_delay', 'carrier', 'flight',
'tailnum', 'origin', 'dest', 'air_time', 'distance', 'hour', 'minute',
'time_hour']
df.to_csv(file, index=False)
CodePudding user response:
We cannot dynamically declare variables with names like df_1990, df_1991... We can however, use a dictionary to store your DataFrames as follows:
all_df = {}
for file in glob.glob("../datasets/*.csv"):
df = pd.read_csv(file)
df.columns = ['year', 'month', 'day', 'dep_time', 'sched_dep_time', 'dep_delay',
'arr_time', 'sched_arr_time', 'arr_delay', 'carrier', 'flight',
'tailnum', 'origin', 'dest', 'air_time', 'distance', 'hour', 'minute',
'time_hour']
file_name = file[file.rfind("/") 1: file.find(".")] #Extract name of file
all_df[file_name ] = df
NOTE: Assuming that the filenames are unique.