Connecting DataFrames via for loop-CodePudding

I would like to combine multiple data frames via a for loop and the concat function and save the results in a dataframe called all_dfs but somehow when the for loop is running it always kicks out the df that was in all_dfs before. Any tips how i could solve the issue?

for i in vd_files_list:
    
    ### Den Szenario-Namen ohne VD herausfiltern
    print(i)
    scenario_name_w_vd = i.split("/")[-1]
    scenario_name = scenario_name_w_vd.split(".")[0]


    try:
        
        VD_filename = r"{}".format(i)
        df = pd.read_csv(filepath_or_buffer=VD_filename,
                         skiprows=(13),
                         names =("Attribute", "Commodity", "Process", "Period","Region", "Vintage", "TimeSlice", "UserConstraint","PV"),
                        dtype={"Attribute":str, "Commodity":str, "Process":str, "Period":str,"Region":str, "Vintage":str, "TimeSlice":str, "UserConstraint":str,"PV":float})

        #hier wird eine extra Spalte "Szenario" mit dem Szenario-Namen hinzugefügt
        df["Szenario"] = scenario_name
            
        all_dfs = pd.concat([df])
        print(all_dfs)

CodePudding user response：

The scope of your all_dfs variable is local to inside your for loop. Initialise it to a new DataFrame before your loop, and then append to it with each iteration.

all_dfs = pd.DataFrame()

for i in vd_files_list:
    ### Den Szenario-Namen ohne VD herausfiltern
    print(i)
    scenario_name_w_vd = i.split("/")[-1]
    scenario_name = scenario_name_w_vd.split(".")[0]


    try:
        VD_filename = r"{}".format(i)
        df = pd.read_csv(filepath_or_buffer=VD_filename,
                         skiprows=(13),
                         names =("Attribute", "Commodity", "Process", "Period","Region", "Vintage", "TimeSlice", "UserConstraint","PV"),
                        dtype={"Attribute":str, "Commodity":str, "Process":str, "Period":str,"Region":str, "Vintage":str, "TimeSlice":str, "UserConstraint":str,"PV":float})

        #hier wird eine extra Spalte "Szenario" mit dem Szenario-Namen hinzugefügt
        df["Szenario"] = scenario_name
            
        all_dfs.append(df)
    except:
        # what errors do you need to handle?
        pass

print(all_dfs)

CodePudding user response：

it solved the problem almost! The only thing i changed was writing all_dfs = all_dfs.append(df) instead of all_dfs.append(df)

This is how the working code looks now (plus I added the exceptions):

all_dfs = pd.DataFrame()

for i in vd_files_list:
    
    ### Den Szenario-Namen ohne VD herausfiltern
    print(i)
    scenario_name_w_vd = i.split("/")[-1]
    scenario_name = scenario_name_w_vd.split(".")[0]


    try:
        
        VD_filename = r"{}".format(i)
        df = pd.read_csv(filepath_or_buffer=VD_filename,
                         skiprows=(13),
                         names =("Attribute", "Commodity", "Process", "Period","Region", "Vintage", "TimeSlice", "UserConstraint","PV"),
                        dtype={"Attribute":str, "Commodity":str, "Process":str, "Period":str,"Region":str, "Vintage":str, "TimeSlice":str, "UserConstraint":str,"PV":float})

        #hier wird eine extra Spalte "Szenario" mit dem Szenario-Namen hinzugefügt
        df["Szenario"] = scenario_name
        
        
        all_dfs = all_dfs.append(df)
        
        print(all_dfs)
        
       
        
    except ValueError:
            tk.messagebox.showerror("Information", "Die ausgewählte Datei ist ungültig")
            return None
    except FileNotFoundError:
            tk.messagebox.showerror("Information", f" Die Datei {file_path} existiert nicht")
            return None