Just starting to use R and am feeling a bit confused. Suppose I have three columns
data = data.frame(id=c(101, 102, 103),column1=c(2, 4, 9),
column2=c(3, 4, 2), column3=c(5, 15, 7))
How can I create a new column (e.g., colmean) that is the mean of the two columns closest in value? I thought about doing a bunch of ifelse statements, but that seemed unnecessarily messy.
In this case, for instance, colmean=c(2.5, 4, 8)
.
CodePudding user response:
Here is a version with a loop:
data = data.frame(id=c(101, 102, 103),column1=c(2, 4, 9),
column2=c(3, 4, 2), column3=c(5, 15, 7))
data$colmean <- NaN # set up empty column for results
for(i in seq(nrow(data))){
data.i <- data[i,-1] # get ith row
d <- as.matrix(dist(c(data.i))) # get distances between values
diag(d) <- NaN # replace diagonal of distance matrix with NaN
hit <- which.min(d) # identify value of lowest distance
pos <- c(row(d)[hit], col(d)[hit]) # get the position (i.e. the values that are closest)
data$colmean[i] <- mean(unlist(data.i[pos])) # calculate mean
}
data
# id column1 column2 column3 colmean
# 1 101 2 3 5 2.5
# 2 102 4 4 15 4.0
# 3 103 9 2 7 8.0
CodePudding user response:
Borrowing the function findClosest()
created here by @Cole, we can do the following,
findClosest <- function(x, n) {
x <- sort(x)
x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)]
}
colMeans(apply(data[-1], 1, function(i)findClosest(i, 2)))
#[1] 2.5 4.0 8.0
CodePudding user response:
Here's a self-contained solution, based on the tidyverse, that is independent of the number of columns to be compared.
library(tidyverse)
data %>%
# Add the means of smallest pairwise differences to the input data
bind_cols(
data %>%
# Make the data tidy (and hence independent of the number of "column"s)
pivot_longer(starts_with("column")) %>%
# For each id/row (replace with rowwise() if appropriate)
group_by(id) %>%
group_map(
function(.x, .y) {
# Form a tibble of all pairwise ciombinations of values
as_tibble(t(combn(.x$value, 2))) %>%
# Calculate pairwise differences
mutate(difference = abs(V1 - V2)) %>%
# Find the smallest pairwise difference
arrange(difference) %>%
head(1) %>%
# Calculate the mean of this pair
pivot_longer(starts_with("V")) %>%
summarise(colmean=mean(value))
}
) %>%
# Convert list of values to column
bind_rows()
)
id column1 column2 column3 colmean
1 101 2 3 5 2.5
2 102 4 4 15 4.0
3 103 9 2 7 8.0