Home > other >  counting observations greater than 20 in R
counting observations greater than 20 in R

Time:10-26

I have a dataset df in R and trying to get the number of observations greater than 20

sample input df:
df <- data.frame(Ensembl_ID = c("ENSG00000284662", "ENSG00000186827", "ENSG00000186891", "ENSG00000160072", "ENSG00000041988"), FS_glm_1_L_Ad_N1_233_233 = c(NA, "11.0704011098281", "18.5580644869131", NA, NA), FS_glm_1_L_Ad_N10_36_36 = c("25.5660669439994", NA, "17.7371918093936", "17.15620204154", NA), FS_glm_1_L_Ad_N2_115_115 = c("26.5660644083686", NA, "11.4006170885388", "17.9862691299736", "9.83546459757003" ), FS_glm_1_L_Ad_N3_84_84 = c("26.5660644053515", NA, "10.9591563938286", NA, NA), FS_glm_1_L_Ad_N4_65_65 = c("26.5660642078305", NA, "11.1498422647029", "10.5876449860129", "9.84781577969005"), FS_glm_1_L_Ad_N5_64_64 = c("26.5660688201853", NA, "18.613395947125", "10.5753792680759", "11.059101026016"), FS_glm_1_L_Ad_N6_55_55 = c("26.5660644039101", NA, "18.478237966938", "10.543187719545", NA), FS_glm_1_L_Ad_N7_32_32 = c("25.5660669436648", NA, "17.9467280294446", "10.0328888122706", NA), FS_glm_1_L_Ad_N8_31_31 = c("25.566069252448", NA, "17.6805603365895", "17.3419854603055", "9.81610669984747"))

class(df)
[1] "data.frame"

I tried

length(which(as.vector(df[,-1]) > 20))
[1] 11

and

sum(df[,-1] > 20, na.rm=TRUE)
[1] 11

However, the real occurrence is only 8 times instead of 11 why so?

The same script works correctly in another data frame but not in this df.

CodePudding user response:

The data is character in this dataframe and not numeric. When numbers are characters weird things happen.

"2" > "13"
#[1] TRUE

Change the data to numeric before using sum.

df[-1] <- lapply(df[-1], as.numeric)
sum(df[,-1] > 20, na.rm=TRUE)
#[1] 8
  • Related