Home > Mobile >  Groupby year-month and find top N smallest standard deviation values columns in Python
Groupby year-month and find top N smallest standard deviation values columns in Python

Time:11-25

With sample data and code below, I'm trying to groupby year-month and find top K columns with smallest std values inside all the columns endswith _values:

import pandas as pd
import numpy as np
from statistics import stdev

np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 6)), index=dates, columns=['A_values', 'B_values', 'C_values', 'D_values', 'E_values', 'target'])

k = 3    # set k as 3

value_cols = df.columns[df.columns.str.endswith('_values')]

def find_topK_smallest_std(group):
    std = stdev(group[value_cols])
    cols = std.nsmallest(k).index
    out_cols = [f'std_{i 1}' for i in range(k)]
    rv = group.loc[:, cols]
    rv.columns = out_cols
    return rv

df.groupby(pd.Grouper(freq='M'), dropna=False).apply(find_topK_smallest_std)

But it raises a type error, how could I fix this issue? Sincere thanks at advance.

Out:

TypeError: can't convert type 'str' to numerator/denominator

Reference link:

Groupby year-month and find top N smallest values columns in Python

CodePudding user response:

In your solution add DataFrame.apply for stdev per columns, if need per rows add axis=1:

def find_topK_smallest_std(group):
    #procssing per columns
    std = group[value_cols].apply(stdev)
    cols = std.nsmallest(k).index
    out_cols = [f'std_{i 1}' for i in range(k)]
    rv = group.loc[:, cols]
    rv.columns = out_cols
    return rv

df = df.groupby(pd.Grouper(freq='M'), dropna=False).apply(find_topK_smallest_std)
print (df)
               std_1     std_2     std_3
2013-02-26  7.333694  3.126731  1.389472
2013-02-27  7.529254  7.843101  6.621605
2013-02-28  6.165574  5.612724  0.866300
2013-03-01  5.693051  3.711608  4.521452
2013-03-02  7.322250  4.763135  5.178144
             ...       ...       ...
2013-05-22  8.795736  3.864723  6.316478
2013-05-23  7.959282  5.140268  1.839659
2013-05-24  5.412016  5.890717  9.081583
2013-05-25  1.088414  1.610210  9.016004
2013-05-26  4.930571  6.893207  2.338785

[90 rows x 3 columns]
  • Related