I would like to combine multiple data frames via a for loop and the concat function and save the results in a dataframe called all_dfs but somehow when the for loop is running it always kicks out the df that was in all_dfs before. Any tips how i could solve the issue?
for i in vd_files_list:
### Den Szenario-Namen ohne VD herausfiltern
print(i)
scenario_name_w_vd = i.split("/")[-1]
scenario_name = scenario_name_w_vd.split(".")[0]
try:
VD_filename = r"{}".format(i)
df = pd.read_csv(filepath_or_buffer=VD_filename,
skiprows=(13),
names =("Attribute", "Commodity", "Process", "Period","Region", "Vintage", "TimeSlice", "UserConstraint","PV"),
dtype={"Attribute":str, "Commodity":str, "Process":str, "Period":str,"Region":str, "Vintage":str, "TimeSlice":str, "UserConstraint":str,"PV":float})
#hier wird eine extra Spalte "Szenario" mit dem Szenario-Namen hinzugefügt
df["Szenario"] = scenario_name
all_dfs = pd.concat([df])
print(all_dfs)
CodePudding user response:
The scope of your all_dfs
variable is local to inside your for
loop. Initialise it to a new DataFrame
before your loop, and then append to it with each iteration.
all_dfs = pd.DataFrame()
for i in vd_files_list:
### Den Szenario-Namen ohne VD herausfiltern
print(i)
scenario_name_w_vd = i.split("/")[-1]
scenario_name = scenario_name_w_vd.split(".")[0]
try:
VD_filename = r"{}".format(i)
df = pd.read_csv(filepath_or_buffer=VD_filename,
skiprows=(13),
names =("Attribute", "Commodity", "Process", "Period","Region", "Vintage", "TimeSlice", "UserConstraint","PV"),
dtype={"Attribute":str, "Commodity":str, "Process":str, "Period":str,"Region":str, "Vintage":str, "TimeSlice":str, "UserConstraint":str,"PV":float})
#hier wird eine extra Spalte "Szenario" mit dem Szenario-Namen hinzugefügt
df["Szenario"] = scenario_name
all_dfs.append(df)
except:
# what errors do you need to handle?
pass
print(all_dfs)
CodePudding user response:
it solved the problem almost! The only thing i changed was writing all_dfs = all_dfs.append(df) instead of all_dfs.append(df)
This is how the working code looks now (plus I added the exceptions):
all_dfs = pd.DataFrame()
for i in vd_files_list:
### Den Szenario-Namen ohne VD herausfiltern
print(i)
scenario_name_w_vd = i.split("/")[-1]
scenario_name = scenario_name_w_vd.split(".")[0]
try:
VD_filename = r"{}".format(i)
df = pd.read_csv(filepath_or_buffer=VD_filename,
skiprows=(13),
names =("Attribute", "Commodity", "Process", "Period","Region", "Vintage", "TimeSlice", "UserConstraint","PV"),
dtype={"Attribute":str, "Commodity":str, "Process":str, "Period":str,"Region":str, "Vintage":str, "TimeSlice":str, "UserConstraint":str,"PV":float})
#hier wird eine extra Spalte "Szenario" mit dem Szenario-Namen hinzugefügt
df["Szenario"] = scenario_name
all_dfs = all_dfs.append(df)
print(all_dfs)
except ValueError:
tk.messagebox.showerror("Information", "Die ausgewählte Datei ist ungültig")
return None
except FileNotFoundError:
tk.messagebox.showerror("Information", f" Die Datei {file_path} existiert nicht")
return None