Home > front end >  One-liner to concatenate two data frames with a distinguishing column?
One-liner to concatenate two data frames with a distinguishing column?

Time:05-18

I often find myself creating two similar data frames that I'd like to rbind together but keep track of which one each row came from with a distinguishing column. My typical motif has been

new_df <- rbind(
  cbind(df1, id="A"),
  cbind(df2, id="B")
)

which collapses nicely into a single line for readability but feels clunky and I'd like to do it more elegantly. I'd prefer to avoid defining the new column for each separately on multiple lines like below:

df1$id <- "A"
df2$id <- "B"
new_df <- rbind(df1, df2)

and while I know that you can make this a one-liner by playing with $<- that tends to make it much less readable than the cbind/rbind motif above. The rows also aren't guaranteed to be unique so I can't do the classic mutate/ifelse motif I've seen recommended elsewhere:

# 'value' is not necessarily unique in the below line
new_df <- cbind(df1, df2) %>% mutate(id = ifelse(something==value, "A", "B")

The problem is often inspired by a process like adding a facetting variable for ggplot - I've made two data frames from different processes but would like to plot them using facets which requires a facetting column.

What's an R-friendly way to rbind two data frames while simultaneously creating a column that tracks which data frame they came from?

CodePudding user response:

It may be easier with bind_rows

library(dplyr)
bind_rows(list(A = df1, B = df2), .id = 'id')

CodePudding user response:

1) We can use rbind/Map from base R like this. This can work with any number of data frames although here we show just two.

do.call("rbind", Map(data.frame, id = c("A", "B"), list(BOD, 10 * BOD)))

2) If we start out with a named list L then base R code would be the following.

L <- list(A = BOD, B = 10 * BOD)
do.call("rbind", Map(data.frame, id = names(L), L))

giving:

    id Time demand
A.1  A    1    8.3
A.2  A    2   10.3
A.3  A    3   19.0
A.4  A    4   16.0
A.5  A    5   15.6
A.6  A    7   19.8
B.1  B   10   83.0
B.2  B   20  103.0
B.3  B   30  190.0
B.4  B   40  160.0
B.5  B   50  156.0
B.6  B   70  198.0

3) Note that just a plain rbind will label the rows with a unique-ified indication of their source if names are included as shown.

rbind(A = BOD, B = 10 * BOD)

giving:

    Time demand
A.1    1    8.3
A.2    2   10.3
A.3    3   19.0
A.4    4   16.0
A.5    5   15.6
A.6    7   19.8
B.1   10   83.0
B.2   20  103.0
B.3   30  190.0
B.4   40  160.0
B.5   50  156.0
B.6   70  198.0
  • Related