I have a vector of sample IDs that are required to be in my dataframe (otherwise the function I am applying to them doesn't work) but are missing (called missing
).
For each of the elements in missing
, I want to add a row to the end of my dataframe where I include the ID but the rest of the data (for all the other columns) in the row is all NAs.
What I am currently trying, based on some other Stack Overflow posts I saw that talk only about adding empty rows, is as follows:
for (element in missing) {
df[nrow(df) 1,] <- NA
df[nrow(df),1] <- element
}
Is there a simpler and faster way to do this, since it takes some time for even 1000 missing elements, whereas I might later have to deal with a lot more.
CodePudding user response:
Sample data:
samp <- data.frame(id = 1:10, val1 = 11:20, val2 = 21:30)
missing <- c(11, 13, 15)
Merge:
merge(samp, data.frame(id = missing), by = "id", all = TRUE) # id val1 val2 # 1 1 11 21 # 2 2 12 22 # 3 3 13 23 # 4 4 14 24 # 5 5 15 25 # 6 6 16 26 # 7 7 17 27 # 8 8 18 28 # 9 9 19 29 # 10 10 20 30 # 11 11 NA NA # 12 13 NA NA # 13 15 NA NA
Row-bind with an external package:
data.table::rbindlist(list(samp, data.frame(id = missing)), use.names = TRUE, fill = TRUE) dplyr::bind_rows(samp, data.frame(id = missing))
Row-bind with base R, a little more work:
samp0 <- samp[rep(1, length(missing)),,drop = FALSE][NA,] samp0$id <- missing rownames(samp0) <- NULL rbind(samp, samp0)
CodePudding user response:
1) Using the built-in anscombe
data frame, this inserts two rows putting -1 and -3 in the x1 column.
library(janitor)
new <- c(-1, -3)
add_row(anscombe, x1 = new)
giving:
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.10 5.39 12.50
9 12 12 12 8 10.84 9.13 8.15 5.56
10 7 7 7 8 4.82 7.26 6.42 7.91
11 5 5 5 8 5.68 4.74 5.73 6.89
12 -1 NA NA NA NA NA NA NA
13 -3 NA NA NA NA NA NA NA
2) Here is a base solution. new
is from (1)
(If overwriting anscombe
is ok, but typically this would make it harder to debug, then omit the first line and replace anscombe2
with anscombe
.)
anscombe2 <- anscombe
anscombe2[nrow(anscombe2) seq_along(new), "x1"] <- new
3) Using the tibble package (or dplyr which imports this) we can use rows_insert. new
is from (1).
library(dplyr)
rows_insert(anscombe, tibble(x1 = new))