Home > Blockchain >  Why does R allow new column creation via tail-slicing?
Why does R allow new column creation via tail-slicing?

Time:06-15

If I create a data frame

df = data.frame(a=c(1,2,3), b=c(4,5,6))

Why does this fail

df$z[c(1,2)] = c(7,8)
Error in `$<-.data.frame`(`*tmp*`, z, value = c(7, 8)) : 
replacement has 2 rows, data has 3

But this work?

df$z[c(2,3)] = c(7,8)
df
  a b  z
1 1 4 NA
2 2 5  7
3 3 6  8

CodePudding user response:

The error is coming from the $<-.data.frame function

> `$<-.data.frame`
function (x, name, value) 
{
    cl <- oldClass(x)
    class(x) <- NULL
    nrows <- .row_names_info(x, 2L)
    if (!is.null(value)) {
        N <- NROW(value)
        if (N > nrows) 
            stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
                "replacement has %d rows, data has %d"), N, nrows), 
                domain = NA)
...

i.e. the condition for N > nrows is satisfied

> NROW(c(7, 8))
[1] 2
> .row_names_info(df, 2L)
[1] 3

which is confirmed by traceback() on the error

> traceback()
3: stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
       "replacement has %d rows, data has %d"), N, nrows), domain = NA)
2: `$<-.data.frame`(`*tmp*`, z, value = c(7, 8))
1: `$<-`(`*tmp*`, z, value = c(7, 8))

CodePudding user response:

If the assignment is a vector, R appears to create the vector z to add to the data.frame, and that vector needs to have the same length as the number rows in the data.frame.

It's more clear what is happening if you work with a list object instead:

df <- list(a = 1:3, b = 4:6)
df$z1[1:2] <- 7:8
df$z2[2:3] <- 7:8
df$z3[c(1,3)] <- 7:8
df
#> $a
#> [1] 1 2 3
#> 
#> $b
#> [1] 4 5 6
#> 
#> $z1
#> [1] 7 8
#> 
#> $z2
#> [1] NA  7  8
#> 
#> $z3
#> [1]  7 NA  8
data.frame(df)
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 3, 2
data.frame(df[-3])
#>   a b z2 z3
#> 1 1 4 NA  7
#> 2 2 5  7 NA
#> 3 3 6  8  8

CodePudding user response:

Just to add one a remark :

df = data.frame(a=c(1, 2, 3, 4),
                b=c(4, 5, 6, 6))

df[c(2, 3), 3] <- 1

It appears that if you use df$something, you instanciate the last row.

df$z[4] <- 1 # works

But :

df$z[1/2/3] <- 1 # do not works

However, if you do it in two steps it works :

df$z <- 1
df$z[1/2/3] <- 1

So I have no good answer but it is maybe one step to the answer.

  •  Tags:  
  • r
  • Related