I have a DF with labels and values as below:
df = pd.DataFrame({'labels' : ['A','A', 'B', 'C', 'C'],'val' : [1, 2, 3, 4, 5]})
Now, I want to calculate the std. dev as below:
for each row:
row with A: (std dev of B and C labels) (first 2 rows would have std dev of all other rows)
row with B: (std dev of A and C labels) (third row would have std dev of all other rows)
row with C: (std dev of A and B labels) (last 2 rows would have std dev of all other rows)
How can I achieve this?
CodePudding user response:
Update To optimise, precompute std dev for each label:
df = pd.DataFrame({'labels' : ['A','A', 'B', 'C', 'C'],'val' : [1, 2, 3, 4, 5]})
labels = df.labels.unique()
std_map = {l:df[df.labels != l]["val"].std() for l in labels}
df["std_dev"] = df["labels"].apply(lambda l: std_map[l])
Iterate dataframe and filter rows with other labels and compute std dev:
df = pd.DataFrame({'labels' : ['A','A', 'B', 'C', 'C'],'val' : [1, 2, 3, 4, 5]})
df["std_dev"] = df.apply(lambda row: df[df.labels != row["labels"]]["val"].std(), axis=1)
[Out]:
labels val std_dev
0 A 1 1.000000
1 A 2 1.000000
2 B 3 1.825742
3 C 4 1.000000
4 C 5 1.000000