Pandas how to calculate the std deviation for all label values except for the selected label?-CodePudding

I have a DF with labels and values as below:

df = pd.DataFrame({'labels' : ['A','A', 'B', 'C', 'C'],'val' : [1, 2, 3, 4, 5]})

Now, I want to calculate the std. dev as below:

for each row:

row with A: (std dev of B and C labels) (first 2 rows would have std dev of all other rows)

row with B: (std dev of A and C labels) (third row would have std dev of all other rows)

row with C: (std dev of A and B labels) (last 2 rows would have std dev of all other rows)

How can I achieve this?

CodePudding user response：

Update To optimise, precompute std dev for each label:

df = pd.DataFrame({'labels' : ['A','A', 'B', 'C', 'C'],'val' : [1, 2, 3, 4, 5]})

labels = df.labels.unique()

std_map = {l:df[df.labels != l]["val"].std() for l in labels}

df["std_dev"] = df["labels"].apply(lambda l: std_map[l])

Iterate dataframe and filter rows with other labels and compute std dev:

df = pd.DataFrame({'labels' : ['A','A', 'B', 'C', 'C'],'val' : [1, 2, 3, 4, 5]})

df["std_dev"] = df.apply(lambda row: df[df.labels != row["labels"]]["val"].std(), axis=1)

[Out]:
  labels  val   std_dev
0      A    1  1.000000
1      A    2  1.000000
2      B    3  1.825742
3      C    4  1.000000
4      C    5  1.000000