Home > Net >  Pandas how to calculate the std deviation for all label values except for the selected label?
Pandas how to calculate the std deviation for all label values except for the selected label?

Time:11-17

I have a DF with labels and values as below:

df = pd.DataFrame({'labels' : ['A','A', 'B', 'C', 'C'],'val' : [1, 2, 3, 4, 5]})

Now, I want to calculate the std. dev as below:

for each row:

row with A: (std dev of B and C labels) (first 2 rows would have std dev of all other rows)

row with B: (std dev of A and C labels) (third row would have std dev of all other rows)

row with C: (std dev of A and B labels) (last 2 rows would have std dev of all other rows)

How can I achieve this?

CodePudding user response:

Update To optimise, precompute std dev for each label:

df = pd.DataFrame({'labels' : ['A','A', 'B', 'C', 'C'],'val' : [1, 2, 3, 4, 5]})

labels = df.labels.unique()

std_map = {l:df[df.labels != l]["val"].std() for l in labels}

df["std_dev"] = df["labels"].apply(lambda l: std_map[l])

Iterate dataframe and filter rows with other labels and compute std dev:

df = pd.DataFrame({'labels' : ['A','A', 'B', 'C', 'C'],'val' : [1, 2, 3, 4, 5]})

df["std_dev"] = df.apply(lambda row: df[df.labels != row["labels"]]["val"].std(), axis=1)

[Out]:
  labels  val   std_dev
0      A    1  1.000000
1      A    2  1.000000
2      B    3  1.825742
3      C    4  1.000000
4      C    5  1.000000
  • Related