Create new columns in pandas dataframe inside for loop and give them different names-CodePudding

I have 4 dataframes (pandas) which are similar in the structure to this one:

index   day1    day2   day3   day4   day5 ....
0        1.23   5.41    0      0      2.31
1        2.31   7.15    0      0      1.32 
...

I want to calculate for each row the mean, std,kurtosis and skewness, and add it as new columns to another exisiting dataframe.Right now I do it using for loop, and changing the columns names by count number of for loop and add the number as string to the columns name, so I don't run over results of previous for loop. This looks like this:

phen_1=rain_calc.iloc[:,:20]
phen_2=rain_calc.iloc[:,20:55]
phen_3=rain_calc.iloc[:,55:70]
phen_4=rain_calc.iloc[:,70:80]
phen_5=rain_calc.iloc[:,70:110]

dfs_phens=[phen_1,phen_2,phen_3,phen_4,phen_5]

phen=1

for df in dfs_phens:
    
    
    mean_col='mean_' str(phen)
    std_col='std_' str(phen)
    skew_col='skew_' str(phen)
    kurt_col='mean_' str(phen)
    total_col='total_' str(phen)
    
    original_df[mean_col] =df.mean(axis=1)
    original_df[std_col] =df.std(axis=1)
    original_df[skew_col] =df.skew(axis=1)
    original_df[kurt_col]=df.kurt(axis=1)
    original_df[total_col]=df.sum(axis=1)
    
    phen=phen 1

This works and give me the output I want - new columns with the calcualted statistics. However, I wonder if there is smarter and more esthetical code way to do so :)

So my goal is to improve my script- to give new columns names inside for loop without creating the strings every time,as i'm doing now.

CodePudding user response：

Have you considered using a dict?

my_dict = {
    "mean" : df.mean(axis=1),
    "std"  : df.std(axis=1),
    }

for colname, data in my_dict.items():
    original_df[colname   "_"   str(phen)] = data

phen=phen 1

CodePudding user response：

You can try to aggregate these functions over the dataframe with pandas.DataFrame.aggregate.

for i, df in enumerate(dfs_phens):

    df_ = (df.agg(['mean', 'std', 'skew', 'kurt', 'sum'], axis='columns')
           .rename(columns=lambda col: f'{col}_{i 1}'))
    original_df = pd.concat([df, df_], axis=1)