Home > Back-end >  Add new column with column names with values greater and lower than mean
Add new column with column names with values greater and lower than mean

Time:05-30

I have a data frame:

set.seed(100)
A <- floor(runif(5, min=0, max=10))
B <- floor(runif(5, min=0, max=10))
C <- floor(runif(5, min=0, max=10))
D <- floor(runif(5, min=0, max=10))
df <- data.frame(A,B,C,D)
df$ms <- rowMeans(df)
df
  A B C D   ms
1 3 4 6 6 4.75
2 2 8 8 2 5.00
3 5 3 2 3 3.25
4 0 5 3 3 2.75
5 4 1 7 6 4.50

Now I'd like to add columns (lower and greater) with column names when the value in particular row is lower in columns A and B than mean and greater in columns C and D also than mean. Desired result:

  A B C D   ms  lower greater
1 3 4 6 6 4.75  A,B   C,D
2 2 8 8 2 5.00  A     C
3 5 3 2 3 3.25  B     NA
4 0 5 3 3 2.75  A     NA
5 4 1 7 6 4.50  A,B   C,D

I was trying to do this with which() however I stuck, could you please give me a hint?

lapply(apply(df,1, function(x) which(df$ms)),names)

CodePudding user response:

You can use apply in base R.

df$lower <- apply(df, 1, function(x) paste(names(which(x[1:2] < x["ms"])), collapse = ", "))
df$greater <- apply(df, 1, function(x) paste(names(which(x[3:4] > x["ms"])), collapse = ", "))

  A B C D   ms lower greater
1 3 4 6 6 4.75  A, B    C, D
2 2 8 8 2 5.00     A       C
3 5 3 2 3 3.25     B        
4 0 5 3 3 2.75     A    C, D
5 4 1 7 6 4.50  A, B    C, D

CodePudding user response:

In base R, I guess you can do something like:

df$lower  <- lapply(df[1:2], \(x) x < df$ms) |>
  data.frame() |>
  apply(1, \(x) paste(names(x)[x], collapse = ","))

df$greater  <- lapply(df[3:4], \(x) x > df$ms) |>
  data.frame() |>
  apply(1, \(x) paste(names(x)[x], collapse = ","))

# Replace any zero-length strings
df[df==""]  <- NA

df
#   A B C D   ms lower greater
# 1 3 4 6 6 4.75   A,B     C,D
# 2 2 8 8 2 5.00     A       C
# 3 5 3 2 3 3.25     B    <NA>
# 4 0 5 3 3 2.75     A     C,D
# 5 4 1 7 6 4.50   A,B     C,D
  • Related