I'm trying to calculate the percentile rank for 2021, relative to the 1981-2021 climatology for June/July/August average temperatures. My data looks like below but much larger:
1981 | 1982 | 1983 | 2021 |
---|---|---|---|
22 | 34 | 40 | 42 |
25 | 36 | 32 | 33 |
For each row, I need to calculate the number of values which are lower or equal to the 2021 value. So for the top row it would be three as all values are less than 42, while the second row would equal 2.
I have written some code which tries to do this, but the problem I'm having is getting the function to use the 2021 value for each row and not just for one value.
L = <- apply(Temperature_df[,(3:50)],1,function(x)
{
sum(x <= Temperature_df[,50]) #50 is the 50th column which is always 2021
})
# the function works if I compare each row to one value of 2021:
sum(x <= Temperature_df[1,50]), but I need to sum the values of
each row against the 2021 value for that row.
Any ideas would be greatly appreciated.
CodePudding user response:
A simple rowSums
will do it. Pay attention to the way column 2021 is extracted, with double [[
.
Temperature_df <- read.table(text = "
1981 1982 1983 2021
22 34 40 42
25 36 32 33
", header = TRUE, check.names = FALSE)
col2021 <- 4
Temperature_df[-col2021] <= Temperature_df[[col2021]]
#> 1981 1982 1983
#> [1,] TRUE TRUE TRUE
#> [2,] TRUE FALSE TRUE
rowSums(Temperature_df[-col2021] <= Temperature_df[[col2021]])
#> [1] 3 2
Created on 2022-03-18 by the reprex package (v2.0.1)