To count total entry to a "string" in R-CodePudding

I have created a df of 50 rows. I have labelled value >0.5 as fraud and rest as not fraud. For the rows labelled as not fraud, i actually place them under another group called iffraud.

num = runif(50)
class_df = data.frame(num)
print(class_df)

class_df$type = ifelse(class_df$num > 0.5, 'fraud',"not fraud")
print(class_df)

iffraud = class_df[class_df["type"] == "not fraud"]

How should i count how many values stored in iffraud?

CodePudding user response：

This could be done by using table like this:

set.seed(1)
num = runif(50)
class_df = data.frame(num)
print(class_df)
class_df$type = ifelse(class_df$num > 0.5, 'fraud',"not fraud")
print(class_df)
#>           num      type
#> 1  0.26550866 not fraud
#> 2  0.37212390 not fraud
#> 3  0.57285336     fraud
#> 4  0.90820779     fraud
#> 5  0.20168193 not fraud
#> 6  0.89838968     fraud
#> 7  0.94467527     fraud
#> 8  0.66079779     fraud
#> 9  0.62911404     fraud
#> 10 0.06178627 not fraud
#> 11 0.20597457 not fraud
#> 12 0.17655675 not fraud
#> 13 0.68702285     fraud
#> 14 0.38410372 not fraud
#> 15 0.76984142     fraud
#> 16 0.49769924 not fraud
#> 17 0.71761851     fraud
#> 18 0.99190609     fraud
#> 19 0.38003518 not fraud
#> 20 0.77744522     fraud
#> 21 0.93470523     fraud
#> 22 0.21214252 not fraud
#> 23 0.65167377     fraud
#> 24 0.12555510 not fraud
#> 25 0.26722067 not fraud
#> 26 0.38611409 not fraud
#> 27 0.01339033 not fraud
#> 28 0.38238796 not fraud
#> 29 0.86969085     fraud
#> 30 0.34034900 not fraud
#> 31 0.48208012 not fraud
#> 32 0.59956583     fraud
#> 33 0.49354131 not fraud
#> 34 0.18621760 not fraud
#> 35 0.82737332     fraud
#> 36 0.66846674     fraud
#> 37 0.79423986     fraud
#> 38 0.10794363 not fraud
#> 39 0.72371095     fraud
#> 40 0.41127443 not fraud
#> 41 0.82094629     fraud
#> 42 0.64706019     fraud
#> 43 0.78293276     fraud
#> 44 0.55303631     fraud
#> 45 0.52971958     fraud
#> 46 0.78935623     fraud
#> 47 0.02333120 not fraud
#> 48 0.47723007 not fraud
#> 49 0.73231374     fraud
#> 50 0.69273156     fraud
iffraud = class_df[class_df["type"] == "not fraud"]
a <- table(iffraud)
a
#> iffraud
#> 0.01339033 0.02333120 0.06178627 0.10794363 0.12555510 0.17655675 0.18621760 
#>          1          1          1          1          1          1          1 
#> 0.20168193 0.20597457 0.21214252 0.26550866 0.26722067 0.34034900 0.37212390 
#>          1          1          1          1          1          1          1 
#> 0.38003518 0.38238796 0.38410372 0.38611409 0.41127443 0.47723007 0.48208012 
#>          1          1          1          1          1          1          1 
#> 0.49354131 0.49769924  not fraud 
#>          1          1         23
a[names(a)=="not fraud"]
#> not fraud 
#>        23

^{Created on 2022-07-12 by the reprex package (v2.0.1)}

CodePudding user response：

Using boolean operations only:

sum(class_df["type"] == "not fraud")
23