I am new to R. I would like to calculate the mean for each row of a dataframe, but using different subset of columns for each row. I have two extra-columns providing me the names of the column that represent the "start" and the "end" that I should use to calculate each mean, respectively.
Let's take this example
dframe <- data.frame(a=c("2","3","4", "2"), b=c("1","3","6", "2"), c=c("4","5","6", "3"), d=c("4","2","8", "5"), e=c("a", "c", "a", "b"), f=c("c", "d", "d", "c"))
dframe
Which provides the following dataframe:
a b c d e f
1 2 1 4 4 a c
2 3 3 5 2 c d
3 4 6 6 8 a d
4 2 2 3 5 b c
The columns e and f represent the first and last column I use to calculate the mean for each row. For example, on line 1, the mean would be calculated including column a, b, c ((2 1 4)/3 -> 2.3) So I would like to obtain the following output:
a b c d e f mean
1 2 1 4 4 a c 2.3
2 3 3 5 2 c d 3.5
3 4 6 6 8 a d 6
4 2 2 3 5 b c 2.5
I learnt how to create the indices, and I want then to use RowMeans, but I cannot find the correct arguments.
dframe %>%
mutate(e_indice = match(e, colnames(dframe)))%>%
mutate(f_indice = match(f, colnames(dframe)))%>%
mutate(mean = RowMeans(????, na.rm = TRUE))
Thanks a lot for your help
CodePudding user response:
One dplyr
option could be:
dframe %>%
rowwise() %>%
mutate(mean = rowMeans(cur_data()[match(e, names(.)):match(f, names(.))]))
a b c d e f mean
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
1 2 1 4 4 a c 2.33
2 3 3 5 2 c d 3.5
3 4 6 6 8 a d 6
4 2 2 3 5 b c 2.5
CodePudding user response:
I would define a helper function that lets you slice the indices you want from a matrix.
rowSlice <- function(x, start, stop) {
replace(x, col(x) < start | col(x) > stop, NA)
}
rowSlice(matrix(1, 4, 4), c(1, 3, 1, 2), c(3, 4, 4, 3))
#> [,1] [,2] [,3] [,4]
#> [1,] 1 1 1 NA
#> [2,] NA NA 1 1
#> [3,] 1 1 1 1
#> [4,] NA 1 1 NA
Then use across()
to select the relvant columns, slice them,
and take the rowMeans()
.
library(dplyr)
dframe <- data.frame(
a = c(2, 3, 4, 2),
b = c(1, 3, 6, 2),
c = c(4, 5, 6, 3),
d = c(4, 2, 8, 5),
e = c("a", "c", "a", "b"),
f = c("c", "d", "d", "c")
)
dframe %>%
mutate(ei = match(e, colnames(dframe))) %>%
mutate(fi = match(f, colnames(dframe))) %>%
mutate(
mean = across(a:d) %>%
rowSlice(ei, fi) %>%
rowMeans(na.rm = TRUE)
)
#> a b c d e f ei fi mean
#> 1 2 1 4 4 a c 1 3 2.333333
#> 2 3 3 5 2 c d 3 4 3.500000
#> 3 4 6 6 8 a d 1 4 6.000000
#> 4 2 2 3 5 b c 2 3 2.500000
CodePudding user response:
A base R solution. First, set columns to numeric. Then create a list of the columns on which to apply the mean. Then apply mean on selected columns.
s <- mapply(seq, match(dframe$e, colnames(dframe)), match(dframe$f, colnames(dframe)))
dframe$mean <- lapply(seq(nrow(dframe)), function(x) rowMeans(dframe[x, s[[x]]]))
a b c d e f mean
1 2 1 4 4 a c 2.333333
2 3 3 5 2 c d 3.5
3 4 6 6 8 a d 6
4 2 2 3 5 b c 2.5
CodePudding user response:
A base R approach using apply
dframe$mean <- apply(dframe, 1, function(x)
mean(as.numeric(x[which(names(x) == x["e"]) : which(names(x) == x["f"])])))
dframe
a b c d e f mean
1 2 1 4 4 a c 2.333333
2 3 3 5 2 c d 3.500000
3 4 6 6 8 a d 6.000000
4 2 2 3 5 b c 2.500000