Spit table based on ID column moving the dupllicate rows into new table-CodePudding

I want to split a table based on ID column into 2 or more tables. moving the row with duplicate ID value into new table (even that other columns does not have duplicates)

to get like this

table 1
ID  value
1   0.94
2   0.94
3   0.20
4   0.38
5   0.71
6   0.34
7   0.67
8   0.89
9   0.76
10  0.26
11  0.92
12  0.11
13  0.72
14  0.93
15  0.70

table 2
ID  value
5   0.20
10  0.11
15  0.72

table 3
ID  value
15  0.70

CodePudding user response：

You can split on the rowid of your ID column

df <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L, 
10L, 10L, 11L, 12L, 13L, 14L, 15L, 15L, 15L), value = c(0.94, 
0.94, 0.2, 0.38, 0.71, 0.2, 0.34, 0.67, 0.89, 0.76, 0.26, 0.11, 
0.92, 0.11, 0.72, 0.93, 0.7, 0.72, 0.7)), row.names = c(NA, -19L
), class = "data.frame")

split(df, data.table::rowid(df$ID))
#> $`1`
#>    ID value
#> 1   1  0.94
#> 2   2  0.94
#> 3   3  0.20
#> 4   4  0.38
#> 5   5  0.71
#> 7   6  0.34
#> 8   7  0.67
#> 9   8  0.89
#> 10  9  0.76
#> 11 10  0.26
#> 13 11  0.92
#> 14 12  0.11
#> 15 13  0.72
#> 16 14  0.93
#> 17 15  0.70
#> 
#> $`2`
#>    ID value
#> 6   5  0.20
#> 12 10  0.11
#> 18 15  0.72
#> 
#> $`3`
#>    ID value
#> 19 15   0.7

^{Created on 2022-01-04 by the reprex package (v2.0.1)}

CodePudding user response：

Using sequence in rle.

split(dat, sequence(rle(dat$ID)$lengths))
# $`1`
#    ID value
# 1   1  0.94
# 2   2  0.94
# 3   3  0.20
# 4   4  0.38
# 5   5  0.71
# 7   6  0.34
# 8   7  0.67
# 9   8  0.89
# 10  9  0.76
# 11 10  0.26
# 13 11  0.92
# 14 12  0.11
# 15 13  0.72
# 16 14  0.93
# 17 15  0.70
# 
# $`2`
#    ID value
# 6   5  0.20
# 12 10  0.11
# 18 15  0.72
# 
# $`3`
#    ID value
# 19 15   0.7

Or ave and seq_along.

split(dat, ave(dat$ID, dat$ID, FUN=seq_along))
# $`1`
# ID value
# 1   1  0.94
# 2   2  0.94
# 3   3  0.20
# 4   4  0.38
# 5   5  0.71
# 7   6  0.34
# 8   7  0.67
# 9   8  0.89
# 10  9  0.76
# 11 10  0.26
# 13 11  0.92
# 14 12  0.11
# 15 13  0.72
# 16 14  0.93
# 17 15  0.70
# 
# $`2`
# ID value
# 6   5  0.20
# 12 10  0.11
# 18 15  0.72
# 
# $`3`
# ID value
# 19 15   0.7

Data:

dat <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L, 
10L, 10L, 11L, 12L, 13L, 14L, 15L, 15L, 15L), value = c(0.94, 
0.94, 0.2, 0.38, 0.71, 0.2, 0.34, 0.67, 0.89, 0.76, 0.26, 0.11, 
0.92, 0.11, 0.72, 0.93, 0.7, 0.72, 0.7)), class = "data.frame", row.names = c(NA, 
-19L))