I have an Excel sheet including 2000
tweets. Five annotators labeled each tweet as Hate
, Neutral
, and Counterhate
. I want to create a new column indicating the majority voting on these labels for each tweet.
For example, if for a tweet, three annotators voted on hate
, one voted Counterhate
, and one Neutral
, then the majority voting of this tweet should be Hate
.
- Question1: Please let me know what the formula is to do this.
- Question2: what is the majority of voting for a tweet that the number of two classes are same? For example, two annotators voted as
Hate
, two asCounterhate
, and oneNeutral
?
Following is the screenshot of my excel sheet and the formula I wrote, but it returns false.
dex | Text | Label1 | Label2 | Label3 | #of Hate | #of Neutral | #of Counterhate | majority | Agreement |
---|---|---|---|---|---|---|---|---|---|
1 | tweet1 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.6 |
2 | tweet2 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.6 |
3 | tweet3 | Neutral | Neutral | Neutral | 0 | 3 | 0 | FALSE | 1 |
4 | tweet4 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.6 |
5 | tweet5 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.4 |
6 | tweet6 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.4 |
7 | tweet7 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 1 |
8 | tweet8 | Neutral | Hate | Neutral | 1 | 2 | 0 | FALSE | 0.6 |
9 | tweet9 | Hate | Hate | Hate | 3 | 0 | 0 | FALSE | 0.6 |
10 | tweet10 | Counterhate | Neutral | Counterhate | 0 | 1 | 2 | FALSE | 0.4 |
I rewrite my formula:
=OR(IF(MAX(F2:H2)=F2,"Hate"),IF(MAX(F2:H2)=G2,"Neutral"),
IF(MAX(F2:H2)=H2,"Counterhate"))
CodePudding user response:
You can try the following for Question 1 in K2
cell:
=LET(values, {"Hate","Neutral","Counterhate"},
counts, COUNTIF($C2:$E2,values),
INDEX(values, XMATCH(MAX(counts),counts)))
Notes:
- Question 2 is not excel related, it is business specific rule you need to define and then to apply that rule in Excel.
- There is no need to use the helper additional columns you have for that. I obtain the result just based on Label columns:
LET
is used for easier maintain the formula, creating variables representing portion of the formula will be repeated more than once.
counts, COUNTIF($C2:$E2,values)
Counts how many times the input range $C2:$E2
contains the values
.
INDEX(values, XMATCH(MAX(counts),counts))
Calculates the index position in values
from the maximum number of repetitions.
The rest is just to expand down the formula in K2
cell.
Tip: If the business rule for Question2 is to consider for example Neutral
, if two or more values have the same count. Then the previous formula can be modified as follow:
=LET(values, {"Hate","Neutral","Counterhate"},counts,
COUNTIF($C2:$E2,values), maxNum, SUM(--
ISNUMBER(XMATCH(counts,MAX(counts)))),
IF(maxNum > 1,"Neutral",INDEX(values, XMATCH(MAX(counts),counts))))