Home > other >  Subsequent ID in list of dfs
Subsequent ID in list of dfs


I have a list of dfs like:

Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)

df1 <- data.frame(Name, Age)

Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)
df2 <- data.frame(Name, Age)

list <- list(df1, df2)

I want to create a subsequent ID through all DFs. My desired Output should look like:

Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)
ID <- c(1:5)

df1 <- data.frame(Name, Age, ID)

Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)
ID <- c(5:9)
df2 <- data.frame(Name, Age, ID)
list <- list(df1, df2)

CodePudding user response:

(I named it list1 instead of list, not wanting to confuse variables/functions :-)

I'm assuming df2 should start at nrow(df1) 1, not at nrow(df1).

lens <- sapply(list1, nrow)
list1 <- Map(function(X, fm, len) transform(X, ID = fm   seq_len(len)),
             list1, c(0, lens[-length(lens)]), lens)
# [[1]]
#    Name Age ID
# 1   Jon  23  1
# 2  Bill  41  2
# 3 Maria  32  3
# 4   Ben  58  4
# 5  Tina  26  5
# [[2]]
#    Name Age ID
# 1   Jon  23  6
# 2  Bill  41  7
# 3 Maria  32  8
# 4   Ben  58  9
# 5  Tina  26 10

CodePudding user response:

IIUC, this should do:

Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)

df1 <- data.frame(Name, Age) %>% mutate(origin = 'df1')

Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)
df2 <- data.frame(Name, Age)  %>% mutate(origin = 'df2')

list <- bind_rows(df1, df2) %>% mutate(ID = row_number()) %>% group_split(origin)


# A tibble: 5 × 4
  Name    Age origin    ID
  <fct> <dbl> <chr>  <int>
1 Jon      23 df1        1
2 Bill     41 df1        2
3 Maria    32 df1        3
4 Ben      58 df1        4
5 Tina     26 df1        5

# A tibble: 5 × 4
  Name    Age origin    ID
  <fct> <dbl> <chr>  <int>
1 Jon      23 df2        6
2 Bill     41 df2        7
3 Maria    32 df2        8
4 Ben      58 df2        9
5 Tina     26 df2       10

You could obviously drop the origin column if you don't need it.

Any reason why the second ID starts at 5 and not 6 in your example?

  • Related