Home > Enterprise >  How to append/add columns to pandas dataframe in loop?
How to append/add columns to pandas dataframe in loop?

Time:01-12

I have a set of functions that run as they iterate through each column of a dataframe and an output is generated with respect to each column each time the function runs. I am trying to figure out a way to store the output into a newly initialized pandas dataframe as they are generated after the function calls. The output generated with respect to each column have different values but the same length (e.g. Index 0 to 3 i.e 4 rows).

(So, essentially what happens in iteration is a column is selected from the original dataframe, goes through a function, function generates an output, and I want to keep appending the outputs in columns into a new df). When I initialize a empty dataframe before the for loop and then add the column to dataframe using df.assign, the code doesn't work. Can someone please help?

The example of output:

index
0     24
1     59
2     43.7
3     9.8

The mainline of the code structure looks like :

def main():
    df_new = pd.Dataframe()      #initializing empty dataframe
    for col in df.columns:
        col_df = df_full[[col]]
        col_df.reset_index(inplace=True)

        #calling function 1 that produces the output()
        #Lets say the output is stored in variable 'value_generated'
         df.assign(value_generated)

Expected new df ( with dummy values).

index     Col1      Col2       Col3      Col4           
0         24        21         20        24.8           
1         59        50         61.1      60.3   
2         43.7      4          48        49
        
        

CodePudding user response:

This code iterates through columns of pre_existing_df. You would need to replace give_output() with your needed functions. Right now it adds a column to your new_df with the same column name like pre_existing_df has and fills it with [1,2,3]

def give_output():
    return [1,2,3]

def main():
    df_new = pd.Dataframe()
    for col in pre_existing_df.columns:
        df_new[col] = give_output()

  • Related