Cluster Class Value
0 0 10 1
1 0 11 1
2 0 14 3
3 0 18 1
4 0 26 1
5 0 29 1
6 0 30 1
7 1 0 2
8 1 19 1
9 1 20 1
10 1 21 2
11 1 36 1
12 1 26 1
13 1 27 1
14 1 37 2
15 1 33 1
This table is based on Which class falls under which Cluster. Like Class 10, 11 , 14 and so on have fallen into Cluster 0. And Value column indicates how many of class member is there. Like 3 member of Class 14 have fallen into Cluster 0.
Now my desired output is like this:
Cluster Class Value Cluster_Sum
0 0 10 1 9
1 0 11 1 9
2 0 14 3 9
3 0 18 1 9
4 0 26 1 9
5 0 29 1 9
6 0 30 1 9
Same for other Clusters too. My final aim to make a column 'Precision' which is
df['Precision'] = df['Value']/ df['Cluster_Sum']
for each row.
How can I do that using python?
EDIT :- It works perfectly fine. Thanks for your help.
Ultimately this is My GOAL. For each class it's number is fixed. Like Class 1 : 10 , Class 2:12 .... so on. I need to add a Column like 'Class_Sum. Which consists the data of the total of class. Then I am able to find the Recall by
`df['Recall'] = df['Value']/ df['Class_Sum']`
But my question is how can I append this my information
Class 1 10
Class 2 12
Class 3 23
Class 4 11
Class 5 17
Class 6 13
Class 7 16
Class 8 15
Class 9 14
Class 10 18
Class 11 09
Class 12 07
Class 13 16
Class 14 21
Class 15 17
Class 16 23
Class 17 10
Class 18 21
Class 19 12
Class 20 45
Class 21 12
Class 22 12
Class 23 15
Class 24 11
Class 25 09
Class 26 11
Class 27 08
Class 28 10
Class 29 11
Class 30 19
Class 31 17
Class 32 15
Class 33 12
Class 34 07
Class 35 06
Class 36 14
Class 37 13
Class 38 16
to my Dataframe like this
Cluster Class Class_SUm Value ClusSum Precision RCll
10 18
11 09
14 21
18 21
26 11
29 11
30 19
How can it be done?
CodePudding user response:
Try with groupby
:
df["Cluster_Sum"] = df.groupby("Cluster")["Value"].transform("sum")
>>> df
Cluster Class Value Cluster_Sum
0 0 10 1 9
1 0 11 1 9
2 0 14 3 9
3 0 18 1 9
4 0 26 1 9
5 0 29 1 9
6 0 30 1 9
7 1 0 2 12
8 1 19 1 12
9 1 20 1 12
10 1 21 2 12
11 1 36 1 12
12 1 26 1 12
13 1 27 1 12
14 1 37 2 12
15 1 33 1 12
CodePudding user response:
groupby
transform("sum")
is your friend here:
df['Precision'] = df["Value"] / df.groupby("Cluster")["Value"].transform("sum")
Output:
>>> df
Cluster Class Value Precision
0 0 10 1 0.111111
1 0 11 1 0.111111
2 0 14 3 0.333333
3 0 18 1 0.111111
4 0 26 1 0.111111
5 0 29 1 0.111111
6 0 30 1 0.111111
7 1 0 2 0.166667
8 1 19 1 0.083333
9 1 20 1 0.083333
10 1 21 2 0.166667
11 1 36 1 0.083333
12 1 26 1 0.083333
13 1 27 1 0.083333
14 1 37 2 0.166667
15 1 33 1 0.083333