How to split a dataframe into 2 by duplicated condition in R-CodePudding

If I have a dataframe named df like so..

 ____________________
| id   |  name | age |
|____________________|
| 0123 | Joe   | 20  |            
|____________________|
| 0123 | Kyle  | 45  |              
|____________________|
| 0333 | Susan | 24  |            
|____________________|
| 0333 | Molly | 80  |              
|____________________|

How can I split this df into two so that neither df has any duplicate id values. Hence, I am looking for them to be like so...

 ____________________
| id   |  name | age |
|____________________|
| 0123 | Joe   | 20  |            
|____________________|
| 0333 | Susan | 24  |              
|____________________|

 ____________________
| id   |  name | age |
|____________________|
| 0333 | Molly | 80  |            
|____________________|
| 0123 | Kyle  | 45  |              
|____________________|

Let me know if you can help!

CodePudding user response：

You can use split with a factor on the sequence of ID's:

 split(df,~ave(id, id, FUN = seq_along))
$`1`
      id    name age
1  0123   Joe     20
3  0333   Susan   24

$`2`
      id    name age
2  0123   Kyle    45
4  0333   Molly   80

This does not matter how many duplicates you have. All the duplicates will belong to a unique dataframe:

split(rbind(df, df),~ave(id, id, FUN = seq_along))
$`1`
      id    name age
1  0123   Joe     20
3  0333   Susan   24

$`2`
      id    name age
2  0123   Kyle    45
4  0333   Molly   80

$`3`
      id    name age
5  0123   Joe     20
7  0333   Susan   24

$`4`
      id    name age
6  0123   Kyle    45
8  0333   Molly   80

CodePudding user response：

a_list <- split(df, duplicated(df$id))

This will work only if each id occurs maximum twice.

CodePudding user response：

Here is a dplyr solution:

df1 <- df %>% 
  distinct(id, .keep_all = TRUE)

df2 <- anti_join(df, df1)

> df1
   id  name age
1 123   Joe  20
2 333 Susan  24
> df2
   id  name age
1 123  Kyle  45
2 333 Molly  80