I have 4 dataframes (pandas) which are similar in the structure to this one:
index day1 day2 day3 day4 day5 ....
0 1.23 5.41 0 0 2.31
1 2.31 7.15 0 0 1.32
...
I want to calculate for each row the mean, std,kurtosis and skewness, and add it as new columns to another exisiting dataframe.Right now I do it using for loop, and changing the columns names by count number of for loop and add the number as string to the columns name, so I don't run over results of previous for loop. This looks like this:
phen_1=rain_calc.iloc[:,:20]
phen_2=rain_calc.iloc[:,20:55]
phen_3=rain_calc.iloc[:,55:70]
phen_4=rain_calc.iloc[:,70:80]
phen_5=rain_calc.iloc[:,70:110]
dfs_phens=[phen_1,phen_2,phen_3,phen_4,phen_5]
phen=1
for df in dfs_phens:
mean_col='mean_' str(phen)
std_col='std_' str(phen)
skew_col='skew_' str(phen)
kurt_col='mean_' str(phen)
total_col='total_' str(phen)
original_df[mean_col] =df.mean(axis=1)
original_df[std_col] =df.std(axis=1)
original_df[skew_col] =df.skew(axis=1)
original_df[kurt_col]=df.kurt(axis=1)
original_df[total_col]=df.sum(axis=1)
phen=phen 1
This works and give me the output I want - new columns with the calcualted statistics. However, I wonder if there is smarter and more esthetical code way to do so :)
So my goal is to improve my script- to give new columns names inside for loop without creating the strings every time,as i'm doing now.
CodePudding user response:
Have you considered using a dict?
my_dict = {
"mean" : df.mean(axis=1),
"std" : df.std(axis=1),
}
for colname, data in my_dict.items():
original_df[colname "_" str(phen)] = data
phen=phen 1
CodePudding user response:
You can try to aggregate these functions over the dataframe with pandas.DataFrame.aggregate
.
for i, df in enumerate(dfs_phens):
df_ = (df.agg(['mean', 'std', 'skew', 'kurt', 'sum'], axis='columns')
.rename(columns=lambda col: f'{col}_{i 1}'))
original_df = pd.concat([df, df_], axis=1)