I am looking to copy all the values of one column and paste them beneath an existing column within R. The thing is that these columns are on different sheets and it can't include adding a column, rather adding to the bottom of the existing column. There are values below those shown below and there are numerous other columns on both Sheets mentioned. It would replicate this:
Current:
SheetA$NumberOfClients SheetB$ClientTotal
20 16
10 19
23 12
24 27
25 31
26 ...
19
17
...
Need to Get To:
SheetA$NumberOfClients
20
10
23
24
25
26
19
17
16
19
12
27
31
...
I have attempted to use rbind as well as the following code, but have received an "Error in libxlsxwriter: 'Worksheet row or column index out of range.'" when attempting to write the data frame as an xlsx file (note it is NOT too large of a file):
SheetA$NumberOfClients <- data.frame(SheetA$NumberOfClients, SheetB$ClientTotal)
CodePudding user response:
To append these to the column NumberOfClients
in SheetA
you will need more rows than SheetA
currently has. This structure is weird, but if for whatever reason you wanted to do this, you could do:
SheetA[(nrow(SheetA) 1):(nrow(SheetA) nrow(SheetB)), "NumberOfClients"] <- SheetB$ClientTotal
Since there is no sample data, I cant test but in the example:
a <- data.frame(col1 = 1:20,
col2 = LETTERS[1:20])
b <- data.frame(col1 = 1:10,
col2 = LETTERS[1:10])
a[(nrow(a) 1):(nrow(a) nrow(b)), "col1"] <- b$col1
CodePudding user response:
Since it appears that your frames have other columns not included here, I think your easiest option is to use dplyr::bind_rows
or data.table::rbindlist
, since they are much more user-friendly when it comes to additional columns and/or different order of columns.
Reproducible data, with an extra column thrown in for good measure.
SheetA <- structure(list(NumberOfClients = c(20L, 10L, 23L, 24L, 25L, 26L, 19L, 17L), other = c(0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -8L), class = "data.frame")
SheetB <- structure(list(ClientTotal = c(16L, 19L, 12L, 27L, 31L)), class = "data.frame", row.names = c(NA, -5L))
No row-combining function is going to be able to infer how you want to combine columns of different names, so you need to be explicit and change the names so they match:
names(SheetB) <- "NumberOfClients"
From here, using rbind
itself, given your comment, is going to fail:
rbind(SheetA, SheetB)
# Error in rbind(deparse.level, ...) :
# numbers of columns of arguments do not match
base R
SheetB2 <- cbind(SheetB, subset(SheetA, select = setdiff(colnames(SheetA), colnames(SheetB)))[1,,FALSE][NA,,FALSE])
# Warning in data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
rbind(SheetA, SheetB2)
# NumberOfClients other
# 1 20 0
# 2 10 0
# 3 23 0
# 4 24 0
# 5 25 0
# 6 26 0
# 7 19 0
# 8 17 0
# 9 16 NA
# 10 19 NA
# 11 12 NA
# 12 27 NA
# 13 31 NA
The subset(.., setdiff(..))
part is to extra columns in SheetA
that we are missing in SheetB
. The [1,,FALSE]
gives us just 1 row of that. The [NA,,FALSE]
gives us the appropriate "NA" for each of those columns. In this case, that portion (before the call to cbind
) effectively returns data.frame(other=NA_real_)
. The reason to go through this? R has at least six different types of NA
, and this is just a clear/declarative way to make sure that we get the correct one. If we don't, then it's possible one of the new columns will be coerced to a class that is different from the original SheetA
. (This might be a bit paranoid, but it is borne from many sessions of troubleshooting.)
dplyr
dplyr::bind_rows(SheetA, SheetB)
# NumberOfClients other
# 1 20 0
# 2 10 0
# 3 23 0
# 4 24 0
# 5 25 0
# 6 26 0
# 7 19 0
# 8 17 0
# 9 16 NA
# 10 19 NA
# 11 12 NA
# 12 27 NA
# 13 31 NA
data.table
data.table::rbindlist(list(SheetA, SheetB), fill = TRUE)
# NumberOfClients other
# <int> <num>
# 1: 20 0
# 2: 10 0
# 3: 23 0
# 4: 24 0
# 5: 25 0
# 6: 26 0
# 7: 19 0
# 8: 17 0
# 9: 16 NA
# 10: 19 NA
# 11: 12 NA
# 12: 27 NA
# 13: 31 NA