I'm trying to sum all columns at once by using this condition: find only values greater than 5 then divide by the length of the column. But it did not work.
Here is what I did:
x1 is only one column in my data and I have 100 columns.
#create data frame
df <- data.frame(x1 = c(7, 3, 1, 9, 12, 8),
x2 = c(7, 5, 6, 1, 4, 4))
reault1<- sum(df$x1>5)/length(df$x1)
#7 is greater than 5
#9 is greater than 5
#12 is greater than 5
#8 is greater than 5
# which means 4 times. then 4/total numbers which is 6
view(reault1) # .66%
CodePudding user response:
Update
You can just use colMeans
on the logical condition:
colMeans(df > 5)
# x1 x2
# 0.6666667 0.3333333
Or with dplyr
:
library(dplyr)
df %>%
summarise(across(everything(), ~ mean(.x > 5)))
Original Answer
It's a little unclear what the expected output should be. To get the sum of all values greater than 5 for each column, then we can first find the values that are greater than 5 (i.e., mydata > 5
). Then, we can compare to the original dataframe using *
, which will change the logical to a 0 or 1 if meets the condition or not (so in reality we are just multiplying by 0 or 1). Then, we can get the sum of the column.
mydata <- mtcars[1:10, 1:5]
colSums(mydata * (mydata > 5))
# mpg cyl disp hp drat
# 203.7 46.0 2086.1 1228.0 0.0
If you just want to get the mean using the full number of rows, then we can just use colMeans
with the same logic:
colMeans(mydata * (mydata > 5))
# mpg cyl disp hp drat
# 20.37 4.60 208.61 122.80 0.00
However, if you are wanting to divide by the number of values greater than 5, then we could do something like this:
apply(mydata, 2, function(x)
sum(x * (x > 5)) / sum(x > 5))