I have been trying to increment a value in R using the apply function. I am currently doing:
t(apply(df[rownames,], 1, fun))
where fun is: ( I want to check if my third column value is less than 10, if yes, increment by 1)
fun<- function(x) {
if(x[3]>=10) {
x[2:3] <-0
}else {
x[3] = x[3] 1 // This does not work
}
print(x)
}
Dataframe:
C1 | C2 | C3 | C4 |
---|---|---|---|
1 | 0 | 0 | 0 |
2 | 1 | 1 | 0 |
3 | 0 | 0 | 0 |
4 | 1 | 1 | 0 |
Expected Output:
C1 | C2 | C3 | C4 |
---|---|---|---|
1 | 0 | 0 | 0 |
2 | 1 | 2 | 0 |
3 | 0 | 0 | 0 |
4 | 1 | 2 | 0 |
Error: Error in x[3] 1 : non-numeric argument to binary operator
CodePudding user response:
You could use ifelse
for this, e.g.
library(dplyr)
read.table( text = "C1 C2 C3 C4
1 0 0 0
2 1 1 0
3 0 0 0
4 1 1 0", header = T) %>%
mutate(C3 = ifelse(C3 < 10 & C3 !=0, C3 1, C3))
C1 C2 C3 C4
1 1 0 0 0
2 2 1 2 0
3 3 0 0 0
4 4 1 2 0
CodePudding user response:
Up front:
your logic is inconsistent with your expected output; I think you may also need
&& x[3] > 0
in your incrementing step, not entirely sure; I'll ignore this for the code below;a function that is meant to modify data should always return the data, not (just)
print
it. In my opinion, functions should onlyprint(.)
when explicitly needed for debugging or verbose operations, in which case I like to add an argument to disable it, perhaps one offun <- function(x, verbose=FALSE) ... fun <- function(x, quiet = TRUE) ...
much of the discussion below is based on the premise that what you need to do is more complicated than just increment/set a column or two; if that is truly all you need to do, then I strongly suggest you learn how to do things in R in a vectorized way, as many things are easier (to read/maintain) and much more efficient.
while numeric indexing (
x[3]
) works, it seems a shame (and more fragile) to use that when you have names available. If you must stay withapply
, thenx
is a named vector, so you can dox["C3"]
, etc.
There's no need to use apply
, R can do things much more quickly as vectors. Know that an R data.frame
is mostly just a list
where each column is its own vectors (usually), so if you want to do something on a specific column, work on the column by itself. In this case, we can do simply:
ind <- df$C3 < 10 & df$C3 > 0
df$C3[ind] <- df$C3[ind] 1
df$C2[!ind] <- df$C3[!ind] <- 0
df
# C1 C2 C3 C4
# 1 1 0 0 0
# 2 2 1 2 0
# 3 3 0 0 0
# 4 4 1 2 0
There are further problems when using apply(df, 1, ..)
, however: if there is one or more character
column, then using apply(.., MARGIN=1, ..)
will likely break what you are doing because it calls as.matrix(df)
before doing row-wise operations. Proof:
> apply(df, 1, function(z) { browser(); z; })
Called from: FUN(newX[, i], ...)
Browse[1]> debug at #1: z
Browse[2]> z
C1 C2 C3 C4
1 0 0 0
Browse[2]> Q
> df$newcol <- "a"
> apply(df, 1, function(z) { browser(); z; })
Called from: FUN(newX[, i], ...)
Browse[1]> debug at #1: z
Browse[2]> z
C1 C2 C3 C4 newcol
"1" "0" "0" "0" "a"
Browse[2]> Q
If you must do things row-wise on a frame, then you can safely choose apply
for it if (a) the columns you need are all the same class or you don't need to do number-ops on them; or (b) you subset the frame so that the columns you pass to apply
are all the same class (non-character
). If you cannot follow those, then you should likely shift to using mapply
(returns simplified vector/array) or Map
(returns a list
). For instance, assuming that your operation here needs to be on the entire row (which I showed above that it does not), then
fun <- function(...) {
x <- list(...)
if (x$C3 >= 10) {
x[c("C2", "C3")] <- 0
} else {
x$C3 <- x$C3 1
}
x
}
and then you need to call it specially in order to make it column-agnostic (not pre-defined on the number of columns:
df <- do.call(rbind.data.frame, do.call(Map, c(list(f = fun), df)))
df
# C1 C2 C3 C4
# 1 1 0 1 0
# 2 2 1 2 0
# 3 3 0 1 0
# 4 4 1 2 0
This seems cumbersome, though. It might be clearer to define the function so that it only works on exactly two columns at a time, so perhaps
fun2 <- function(C2, C3) {
if (C3 >= 10) {
C2 <- C3 <- 0
} else {
C3 <- C3 1
}
list(C2, C3)
}
mapply(fun2, df$C2, df$C3)
# [,1] [,2] [,3] [,4]
# [1,] 0 1 0 1
# [2,] 1 2 1 2
df[c("C2","C3")] <- asplit(mapply(fun2, df$C2, df$C3), 1)
df
# C1 C2 C3 C4
# 1 1 0 1 0
# 2 2 1 2 0
# 3 3 0 1 0
# 4 4 1 2 0
This cumbersome function, though, is only required if the operations you are doing truly must not be vectorized.