I want to split a table based on ID column into 2 or more tables. moving the row with duplicate ID value into new table (even that other columns does not have duplicates)
table 1
ID value
1 0.94
2 0.94
3 0.20
4 0.38
5 0.71
5 0.20
6 0.34
7 0.67
8 0.89
9 0.76
10 0.26
10 0.11
11 0.92
12 0.11
13 0.72
14 0.93
15 0.70
15 0.72
15 0.70
to get like this
table 1
ID value
1 0.94
2 0.94
3 0.20
4 0.38
5 0.71
6 0.34
7 0.67
8 0.89
9 0.76
10 0.26
11 0.92
12 0.11
13 0.72
14 0.93
15 0.70
table 2
ID value
5 0.20
10 0.11
15 0.72
table 3
ID value
15 0.70
CodePudding user response:
You can split
on the rowid
of your ID
column
df <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L,
10L, 10L, 11L, 12L, 13L, 14L, 15L, 15L, 15L), value = c(0.94,
0.94, 0.2, 0.38, 0.71, 0.2, 0.34, 0.67, 0.89, 0.76, 0.26, 0.11,
0.92, 0.11, 0.72, 0.93, 0.7, 0.72, 0.7)), row.names = c(NA, -19L
), class = "data.frame")
split(df, data.table::rowid(df$ID))
#> $`1`
#> ID value
#> 1 1 0.94
#> 2 2 0.94
#> 3 3 0.20
#> 4 4 0.38
#> 5 5 0.71
#> 7 6 0.34
#> 8 7 0.67
#> 9 8 0.89
#> 10 9 0.76
#> 11 10 0.26
#> 13 11 0.92
#> 14 12 0.11
#> 15 13 0.72
#> 16 14 0.93
#> 17 15 0.70
#>
#> $`2`
#> ID value
#> 6 5 0.20
#> 12 10 0.11
#> 18 15 0.72
#>
#> $`3`
#> ID value
#> 19 15 0.7
Created on 2022-01-04 by the reprex package (v2.0.1)
CodePudding user response:
Using sequence
in rle
.
split(dat, sequence(rle(dat$ID)$lengths))
# $`1`
# ID value
# 1 1 0.94
# 2 2 0.94
# 3 3 0.20
# 4 4 0.38
# 5 5 0.71
# 7 6 0.34
# 8 7 0.67
# 9 8 0.89
# 10 9 0.76
# 11 10 0.26
# 13 11 0.92
# 14 12 0.11
# 15 13 0.72
# 16 14 0.93
# 17 15 0.70
#
# $`2`
# ID value
# 6 5 0.20
# 12 10 0.11
# 18 15 0.72
#
# $`3`
# ID value
# 19 15 0.7
Or ave
and seq_along
.
split(dat, ave(dat$ID, dat$ID, FUN=seq_along))
# $`1`
# ID value
# 1 1 0.94
# 2 2 0.94
# 3 3 0.20
# 4 4 0.38
# 5 5 0.71
# 7 6 0.34
# 8 7 0.67
# 9 8 0.89
# 10 9 0.76
# 11 10 0.26
# 13 11 0.92
# 14 12 0.11
# 15 13 0.72
# 16 14 0.93
# 17 15 0.70
#
# $`2`
# ID value
# 6 5 0.20
# 12 10 0.11
# 18 15 0.72
#
# $`3`
# ID value
# 19 15 0.7
Data:
dat <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 9L,
10L, 10L, 11L, 12L, 13L, 14L, 15L, 15L, 15L), value = c(0.94,
0.94, 0.2, 0.38, 0.71, 0.2, 0.34, 0.67, 0.89, 0.76, 0.26, 0.11,
0.92, 0.11, 0.72, 0.93, 0.7, 0.72, 0.7)), class = "data.frame", row.names = c(NA,
-19L))