Home > Software engineering >  Iterate through columns of pandas dataframe and create a new dataframe for each selected column in a
Iterate through columns of pandas dataframe and create a new dataframe for each selected column in a

Time:01-08

I have a pandas dataframe with multiple columns and I am trying to iterate through the dataframe by selecting one column at a time, create a new dataframe with that one column, perform some functions. Then select the next column in the dataframe, perform functions and continue the process until I reach the last column in the dataframe.

Currently, I am doing it with only one column. I am stuck on how to do this in a loop and run the functions inside a loop. Could someone please help on how I can iterate through the columns in a loop, create a new dataframe for each selected column and run the functions inside that loop.

df:

date                   Col1      Col2       Col3      Col4           
1990-01-02 12:00:00     24        24        24.8      24.8           
1990-01-02 01:00:00     59        58        60        60.3   
1990-01-02 02:00:00     43.7      43.9      48        49

Code

df_new = pd.DataFrame(df['Col1'])
df.reset_index(inplace=True)

def function1(df_new):
    line 1
    line 2

def function2():
    line 1
    line 2

CodePudding user response:

To iterate through the columns of a pandas DataFrame and create a new DataFrame for each selected column in a loop, you can use a for loop and the DataFrame.columns attribute. Here's an example of how you could do this:

for col in df.columns:
    # create a new DataFrame with only the current column
    col_df = df[[col]]

    # perform functions on col_df here


Inside the for loop, col will be a string representing the name of the current column. You can use this to select the column from the original DataFrame and create a new DataFrame with only that column.

You can then perform your desired functions on the new DataFrame col_df.

CodePudding user response:

If you insist on iterating through columns then you'll have a series for every column, in which case I don't see the added value of converting it to a DataFrame first.

Instead, perform the functions on each series:

def Add(col):
    return col 1

def Minus(col):
    return col-1

def Double(col):
    return col*2

for col in df.columns:
    print(Add(df[col]))
    Minus(df[col])
    Double(df[col])

Be sure to save the results if you want to do further manipulations with them when the loop is finished.

However, I advise instead looking at other possibilities, for example using apply() and lambda:

df.apply(lambda x: x 1 , axis=0)

This is much more efficient.

  • Related