Home > OS >  How to split a dataframe into 2 by duplicated condition in R
How to split a dataframe into 2 by duplicated condition in R


If I have a dataframe named df like so..

| id   |  name | age |
| 0123 | Joe   | 20  |            
| 0123 | Kyle  | 45  |              
| 0333 | Susan | 24  |            
| 0333 | Molly | 80  |              

How can I split this df into two so that neither df has any duplicate id values. Hence, I am looking for them to be like so...

| id   |  name | age |
| 0123 | Joe   | 20  |            
| 0333 | Susan | 24  |              

| id   |  name | age |
| 0333 | Molly | 80  |            
| 0123 | Kyle  | 45  |              

Let me know if you can help!

CodePudding user response:

You can use split with a factor on the sequence of ID's:

 split(df,~ave(id, id, FUN = seq_along))
      id    name age
1  0123   Joe     20
3  0333   Susan   24

      id    name age
2  0123   Kyle    45
4  0333   Molly   80

This does not matter how many duplicates you have. All the duplicates will belong to a unique dataframe:

split(rbind(df, df),~ave(id, id, FUN = seq_along))
      id    name age
1  0123   Joe     20
3  0333   Susan   24

      id    name age
2  0123   Kyle    45
4  0333   Molly   80

      id    name age
5  0123   Joe     20
7  0333   Susan   24

      id    name age
6  0123   Kyle    45
8  0333   Molly   80

CodePudding user response:

a_list <- split(df, duplicated(df$id))

This will work only if each id occurs maximum twice.

CodePudding user response:

Here is a dplyr solution:

df1 <- df %>% 
  distinct(id, .keep_all = TRUE)

df2 <- anti_join(df, df1)
> df1
   id  name age
1 123   Joe  20
2 333 Susan  24
> df2
   id  name age
1 123  Kyle  45
2 333 Molly  80
  • Related