Here is my sample dataset:
mydata = data.frame (ID =c(1,2,3,4,5),
subject = c("His","Geo","Geo","His","Geo"),
age = c(21,24,26,23,26))
I would like to add a row at the top. I would like it to say "School 1" in the ID column while all other columns remain blank. The following is what I am looking for:
mydata = data.frame (ID =c("School 1",1,2,3,4,5),
subject = c(NA,"His","Geo","Geo","His","Geo"),
age = c(NA,21,24,26,23,26))
I have tried the following, but it ends up populating the value across all columns:
mydata <- rbind(c("School 1"), mydata)
I know the following code will get me what I want, but I would like to avoid having to list out NA's as my dataset has tons of columns
mydata <- rbind(c("School 1", NA,NA), mydata)
Any help is appreciated!
CodePudding user response:
A possible solution, based on dplyr
. We first need to convert ID
from numeric
to character
.
library(dplyr)
mydata %>%
mutate(ID = as.character(ID)) %>%
bind_rows(list(ID = "School 1"), .)
#> # A tibble: 6 × 3
#> ID subject age
#> <chr> <chr> <dbl>
#> 1 School 1 <NA> NA
#> 2 1 His 21
#> 3 2 Geo 24
#> 4 3 Geo 26
#> 5 4 His 23
#> 6 5 Geo 26
CodePudding user response:
Using `length<-`
, which fills up non existing elements with NA
up to a specified total length you may create a vector with length exactly to ncol(mydata)
, with the first element 'School 1'
, then rbind
.
rbind(`length<-`("School 1", ncol(mydata)), mydata)
# ID subject age
# 1 School 1 <NA> NA
# 2 1 His 21
# 3 2 Geo 24
# 4 3 Geo 26
# 5 4 His 23
# 6 5 Geo 26
Explanation
Maybe it is worth thinking about the concept of a data frame to be able to better understand OP's problem. Actually it's a modified list
,
typeof(mydata)
# [1] "list"
and consists of vectors as elements with equal lengths, which we can see when we unclass
it.
unclass(mydata)
# $ID
# [1] 1 2 3 4 5
#
# $subject
# [1] "His" "Geo" "Geo" "His" "Geo"
#
# $age
# [1] 21 24 26 23 26
#
# attr(,"row.names")
# [1] 1 2 3 4 5
We may easily add other elements to a list,
c(mydata, foo='something')
# $ID
# [1] 1 2 3 4 5
#
# $subject
# [1] "His" "Geo" "Geo" "His" "Geo"
#
# $age
# [1] 21 24 26 23 26
#
# $foo
# [1] "something"
but making a data frame out of it, the values are getting recycled, if nothing else is provided (which is actually very useful).
as.data.frame(c(mydata, foo='something'))
# ID subject age foo
# 1 1 His 21 something
# 2 2 Geo 24 something
# 3 3 Geo 26 something
# 4 4 His 23 something
# 5 5 Geo 26 something
This is exactly the same with cbind
.
cbind(mydata, foo='something')
# ID subject age foo
# 1 1 His 21 something
# 2 2 Geo 24 something
# 3 3 Geo 26 something
# 4 4 His 23 something
# 5 5 Geo 26 something
If we provide a vector of appropriate length (i.e. a column to the data frame), R has no reason to recycle.
as.data.frame(c(mydata, list(foo=c('something', rep(NA, 4)))))
# ID subject age foo
# 1 1 His 21 something
# 2 2 Geo 24 <NA>
# 3 3 Geo 26 <NA>
# 4 4 His 23 <NA>
# 5 5 Geo 26 <NA>
cbind(mydata, foo=c('something', rep(NA, 4)))
# ID subject age foo
# 1 1 His 21 something
# 2 2 Geo 24 <NA>
# 3 3 Geo 26 <NA>
# 4 4 His 23 <NA>
# 5 5 Geo 26 <NA>
Adding rows is slightly different. As we easily may see in the unclass
ed data frame above, we may imagine, that we need to append
something to each single vector at the desired position. It goes against the grain, so to speak. Obviously this is also computational more expensive, and thus much slower.
as.data.frame(Map(append, mydata, values=c('something', rep(NA, 2)), after=0))
# ID subject age
# 1 something <NA> <NA>
# 2 1 His 21
# 3 2 Geo 24
# 4 3 Geo 26
# 5 4 His 23
# 6 5 Geo 26
Notice, that to append a shorter vector than ncol
will also result in recycling as experienced by OP.
as.data.frame(Map(append, mydata, values='something', after=0))
# ID subject age
# 1 something something something
# 2 1 His 21
# 3 2 Geo 24
# 4 3 Geo 26
# 5 4 His 23
# 6 5 Geo 26
R's rbind
already cares for this in C language, which is fast, and we don't need to Map
over anything;
rbind(c("something", NA, NA), mydata)
to avoid endless typing of NA
we may thus use the proposed solution:
rbind(`length<-`("School 1", ncol(mydata)), mydata)
# ID subject age
# 1 something <NA> <NA>
# 2 1 His 21
# 3 2 Geo 24
# 4 3 Geo 26
# 5 4 His 23
# 6 5 Geo 26