I have a pandas dataframe with multiple columns and I am trying to iterate through the dataframe by selecting one column at a time, create a new dataframe with that one column, perform some functions. Then select the next column in the dataframe, perform functions and continue the process until I reach the last column in the dataframe.
Currently, I am doing it with only one column. I am stuck on how to do this in a loop and run the functions inside a loop. Could someone please help on how I can iterate through the columns in a loop, create a new dataframe for each selected column and run the functions inside that loop.
df:
date Col1 Col2 Col3 Col4
1990-01-02 12:00:00 24 24 24.8 24.8
1990-01-02 01:00:00 59 58 60 60.3
1990-01-02 02:00:00 43.7 43.9 48 49
Code
df_new = pd.DataFrame(df['Col1'])
df.reset_index(inplace=True)
def function1(df_new):
line 1
line 2
def function2():
line 1
line 2
CodePudding user response:
To iterate through the columns of a pandas DataFrame and create a new DataFrame for each selected column in a loop, you can use a for loop and the DataFrame.columns attribute. Here's an example of how you could do this:
for col in df.columns:
# create a new DataFrame with only the current column
col_df = df[[col]]
# perform functions on col_df here
Inside the for loop, col will be a string representing the name of the current column. You can use this to select the column from the original DataFrame and create a new DataFrame with only that column.
You can then perform your desired functions on the new DataFrame col_df.
CodePudding user response:
If you insist on iterating through columns then you'll have a series for every column, in which case I don't see the added value of converting it to a DataFrame first.
Instead, perform the functions on each series:
def Add(col):
return col 1
def Minus(col):
return col-1
def Double(col):
return col*2
for col in df.columns:
print(Add(df[col]))
Minus(df[col])
Double(df[col])
Be sure to save the results if you want to do further manipulations with them when the loop is finished.
However, I advise instead looking at other possibilities, for example using apply()
and lambda
:
df.apply(lambda x: x 1 , axis=0)
This is much more efficient.