Add a value as a new row to a dataframe but keep all other columns NA-CodePudding

Here is my sample dataset:

mydata = data.frame (ID =c(1,2,3,4,5),
subject = c("His","Geo","Geo","His","Geo"),
age = c(21,24,26,23,26))

I would like to add a row at the top. I would like it to say "School 1" in the ID column while all other columns remain blank. The following is what I am looking for:

mydata = data.frame (ID =c("School 1",1,2,3,4,5),
subject = c(NA,"His","Geo","Geo","His","Geo"),
age = c(NA,21,24,26,23,26))

I have tried the following, but it ends up populating the value across all columns:

mydata <- rbind(c("School 1"), mydata)

I know the following code will get me what I want, but I would like to avoid having to list out NA's as my dataset has tons of columns

mydata <- rbind(c("School 1", NA,NA), mydata)

Any help is appreciated!

CodePudding user response：

A possible solution, based on dplyr. We first need to convert ID from numeric to character.

library(dplyr)

mydata %>% 
  mutate(ID = as.character(ID)) %>% 
  bind_rows(list(ID = "School 1"), .)

#> # A tibble: 6 × 3
#>   ID       subject   age
#>   <chr>    <chr>   <dbl>
#> 1 School 1 <NA>       NA
#> 2 1        His        21
#> 3 2        Geo        24
#> 4 3        Geo        26
#> 5 4        His        23
#> 6 5        Geo        26

CodePudding user response：

Using `length<-`, which fills up non existing elements with NA up to a specified total length you may create a vector with length exactly to ncol(mydata), with the first element 'School 1', then rbind.

rbind(`length<-`("School 1", ncol(mydata)), mydata)
#         ID subject age
# 1 School 1    <NA>  NA
# 2        1     His  21
# 3        2     Geo  24
# 4        3     Geo  26
# 5        4     His  23
# 6        5     Geo  26

Explanation

Maybe it is worth thinking about the concept of a data frame to be able to better understand OP's problem. Actually it's a modified list,

typeof(mydata)
# [1] "list"

and consists of vectors as elements with equal lengths, which we can see when we unclass it.

unclass(mydata)
# $ID
# [1] 1 2 3 4 5
# 
# $subject
# [1] "His" "Geo" "Geo" "His" "Geo"
# 
# $age
# [1] 21 24 26 23 26
# 
# attr(,"row.names")
# [1] 1 2 3 4 5

We may easily add other elements to a list,

c(mydata, foo='something')
# $ID
# [1] 1 2 3 4 5
# 
# $subject
# [1] "His" "Geo" "Geo" "His" "Geo"
# 
# $age
# [1] 21 24 26 23 26
# 
# $foo
# [1] "something"

but making a data frame out of it, the values are getting recycled, if nothing else is provided (which is actually very useful).

as.data.frame(c(mydata, foo='something'))
#   ID subject age       foo
# 1  1     His  21 something
# 2  2     Geo  24 something
# 3  3     Geo  26 something
# 4  4     His  23 something
# 5  5     Geo  26 something

This is exactly the same with cbind.

cbind(mydata, foo='something')
#   ID subject age       foo
# 1  1     His  21 something
# 2  2     Geo  24 something
# 3  3     Geo  26 something
# 4  4     His  23 something
# 5  5     Geo  26 something

If we provide a vector of appropriate length (i.e. a column to the data frame), R has no reason to recycle.

as.data.frame(c(mydata, list(foo=c('something', rep(NA, 4)))))
#   ID subject age       foo
# 1  1     His  21 something
# 2  2     Geo  24      <NA>
# 3  3     Geo  26      <NA>
# 4  4     His  23      <NA>
# 5  5     Geo  26      <NA>

cbind(mydata, foo=c('something', rep(NA, 4)))
#   ID subject age       foo
# 1  1     His  21 something
# 2  2     Geo  24      <NA>
# 3  3     Geo  26      <NA>
# 4  4     His  23      <NA>
# 5  5     Geo  26      <NA>

Adding rows is slightly different. As we easily may see in the unclassed data frame above, we may imagine, that we need to append something to each single vector at the desired position. It goes against the grain, so to speak. Obviously this is also computational more expensive, and thus much slower.

as.data.frame(Map(append, mydata, values=c('something', rep(NA, 2)), after=0))
#          ID subject  age
# 1 something    <NA> <NA>
# 2         1     His   21
# 3         2     Geo   24
# 4         3     Geo   26
# 5         4     His   23
# 6         5     Geo   26

Notice, that to append a shorter vector than ncol will also result in recycling as experienced by OP.

as.data.frame(Map(append, mydata, values='something', after=0))
#          ID   subject       age
# 1 something something something
# 2         1       His        21
# 3         2       Geo        24
# 4         3       Geo        26
# 5         4       His        23
# 6         5       Geo        26

R's rbind already cares for this in C language, which is fast, and we don't need to Map over anything;

rbind(c("something", NA, NA), mydata)

to avoid endless typing of NA we may thus use the proposed solution:

rbind(`length<-`("School 1", ncol(mydata)), mydata)
#          ID subject  age
# 1 something    <NA> <NA>
# 2         1     His   21
# 3         2     Geo   24
# 4         3     Geo   26
# 5         4     His   23
# 6         5     Geo   26